Researchers develop new approach to document genetic ancestry
University of Glasgow researchers have helped to develop a new method for understanding the relationships between different DNA sequences and where they come from.This information has widespread applications, from understanding the development of viruses, such as SARS-CoV-2, the strain of coronavirus that causes COVID-19, to precision medicine, an approach to disease treatment and prevention that takes into account individual genetic information.
The study, led by the Big Data Institute , is published in Genetics and is the featured paper in the September 2024 edition.
Genetics is rapidly becoming part of our everyday lives. Nearly every week sees another newspaper headline about genetics and human ancestry, with huge datasets of DNA sequences routinely generated and used for medical study.
We can make sense of this genomic big data by working out the historical process that created it - in other words, where the DNA sequences came from. If we take a small section of someone’s DNA we know it must have come from one of their two parents in the last generation, and previously from one of their four grandparents in the generation before that, and so on. This means we can represent the history of different sections of DNA by tracing them backwards through time.
If we do this for a large set of DNA sequences from different people, we can build up a set of genetic "family trees", a genealogy of DNA sequences. This grand network of inheritance is sometimes called an ancestral recombination graph (ARG). Previous work by the same research group has shown that such networks can be used not only to illuminate the history of our genome, but also to compress DNA data and speed up genetic analyses.
Lead author and evolutionary geneticist at the Big Data Institute, Dr Yan Wong said "There has been surprisingly little consensus on exactly how to represent such an ancestral recombination graph on a computer. In this study, we outline a simple and efficient encoding of genetic genealogies in which each ancestor can be thought of as a fragmentary length of DNA, or ’ancestral genome’ at some point in the past. The history of today’s genetic sequences is traced back through those ancestral genomes, keeping track of which chunks of DNA were inherited from which ancestors."
Dr Anastasia Ignatieva, of the University of Glasgow’s School of Mathematics & Statistics, is a co-author of the study and contributed to developing the new methodology. Dr Ignatieva said: "The genetic relationships between sampled individuals can be represented by a graph called the genetic genealogy.
"Reconstructing genealogies from sequencing data is an extremely active topic of current research in population genetics, since they allows us to learn about the evolutionary processes that shape genetic diversity. Our work presents a universal approach for encoding genealogies as graphs, allowing for easier sharing and comparison of results produced by different groups of scientists all’over the world."
By using this simple scheme, recording genome-to-genome transmission of information, the study shows that the same genetic ancestry can be stored to different degrees of precision. This means relationships between different DNA sequences can be represented without having to know or guess the precise timing of joins and splits that underlie the true history of inheritance. The researchers also show that their description of genetic inheritance is flexible enough to deal with the wide variety of different methods that researchers currently use to reconstruct genetic history.
The approach allows scientists to store and analyse large amounts of genetic data on a standard laptop, and it generalises to any species of life on earth. For example, it forms the basis of a ’unified genealogy’ of over 7,000 publicly available whole human genome sequences that the researchers released previously. They are currently creating a genetic genealogy of millions of SARS-CoV-2 genomes, collected over the span of the coronavirus pandemic, which will allow analysis of the recent history of the virus, pinpointing the emergence of novel mixed (or "recombinant") strains.
Dr Wong added: "We hope that this formal standard for how to represent genetic genealogies can help to unify the field of genetic history and make it easier for scientists to analyse, share and compare results. This will be crucial as we move into an era of genomic medicine, where genetic data will be used to diagnose and treat diseases, and where understanding the history of our genomes will be key to understanding our health and ancestry."