Decades of DNA sequencing data from UK child development study released

04 Mar, 2025
Newsdesk
The first resource containing high-resolution DNA sequencing data for over 37,000 children and parents collected over multiple decades from across the UK is now available to researchers worldwide.
Thumbnail
Professor Matthew Hurles, Director of the Sanger Institute. Credit – Wellcome Sanger Institute.

The data release is led by the Wellcome Sanger Institute, the Children of the 90s study (also known as ALSPAC), the Millennium Cohort Study, and Born in Bradford, and supported by the Medical Research Council and the Economic and Social Research Council.

The work is supported by the ongoing efforts of Population Research UK, a UK-wide initiative led by teams at the University of Bristol and University College London, which aids longitudinal population studies by working to coordinate and connect the current research landscape.

Now available on the European Genome-phenome Archive, these high-quality genomic data can be used in combination with the existing longitudinal health and survey information provided by participating families.

These combined data resources offer the scientific community the opportunity to make valuable insights in areas ranging from population genetics to the social sciences.

For example, it could be used to investigate the impact of genetic variation on neurodevelopmental conditions or childhood obesity and how these are influenced by environmental factors.

Longitudinal research follows large numbers of participants over multiple years, repeatedly examining them at regular time points through, for example, blood tests, body measurements, and health questionnaires, to detect changes over time.

Previously, large DNA sequence datasets have typically focused on children with rare conditions or adult population cohorts. This new data release focuses on sequencing ‘birth cohorts’, which are population-based cohorts of people followed from birth through to adolescence or early adulthood.

To produce this latest data release, researchers at the Sanger Institute sequenced all 20,000 genes in the human genome, known as exome sequencing, in samples from 8,436 children and 3,215 parents from the Children of the 90s study, 7,667 children and 6,925 parents from the MCS, and 8,784 children and 2,875 parents from BiB.

These three UK longitudinal birth cohort studies are internationally recognised and data from these cohorts have already been used to study the contribution of common genetic variants on phenotypes ranging from childhood obesity to parental nurturing behaviours and anxiety and depression.

For example, by using Children of the 90s data, researchers found that a genetic variant in a gene called MC4R is associated with increased weight across childhood and studies like this could help design effective weight management interventions and change the way society views obesity.

That specific study used targeted DNA sequencing of the MC4R gene, whereas the new exome sequencing data reported here will allow similar investigations of other genes in the human genome. This will help drive more discoveries and research that could benefit human health.

The team has made the anonymised data as accessible as possible to approved researchers, including drafting a data note and other materials to help support its use by those who are less familiar with large-scale sequencing data.

In coming months, this DNA sequence data resource will be expanded to encompass all participants in these cohorts as well as additional cohorts. The value of these data will be enhanced by harmonising the data across the different cohorts, providing a more powerful resource than could be achieved by one study in isolation.

Dr Carl Anderson, Interim Head of Human Genetics at the Wellcome Sanger Institute, said: “Longitudinal population studies from the UK have already had a huge impact on biomedical research worldwide.

“This significant addition of whole exome sequencing data will further transform our understanding of the development of complex traits and diseases across the life course.”

Dr Richard Evans, Interim Head of Population Health Sciences at the Medical Research Council, added: “The UK's cohorts and longitudinal population studies are an extraordinary national asset, made possible by the participation of a diverse range of people.

“The rich data and samples from these studies, when combined with whole exome sequencing, can unlock new research questions and insights into human society, development, health and ageing.

“MRC’s funding is part of our overall investment in understanding the drivers of disease to enable precision prevention and personalised treatments, and maximising existing infrastructure to ensure real value for money.

“This work aligns perfectly with a new exciting national resource that is supported by MRC and ESRC, Population Research UK, which is all about coordinating and leveraging UK cohorts.”

Professor Matthew Hurles, Director of the Wellcome Sanger Institute, commented: “Great science is built on collaboration and this release would not have been possible without the engagement of the families themselves, the hard work of teams managing these longitudinal studies, sustained investment in these cohorts – especially from Wellcome and the Medical Research Council – the sequencing and data analysis power of the Wellcome Sanger Institute and the support of Population Research UK.

“We aim to continue to build on this resource and provide high-quality, accessible genomic data for researchers worldwide. This initiative further exemplifies the vast potential of bringing together the UK’s life science assets including committed research participants, researchers, governmental and charitable funding agencies, and genomic and computational capabilities.”

The data are available to approved researchers worldwide, via the European Genome-phenome Archive (EGA). To access, visit https://ega-archive.org/

• The EGA study accession numbers are:

ALSPAC (study: EGAS00001005273): dataset EGAD00001015371

MCS (study: EGAS00001007789): dataset EGAD00001015372

BiB (study: EGAS00001006978): dataset EGAD00001015370