Macrogenetics: A New Era of Biology With Big Data

Thanks to staggeringly large, high-resolution data sets, biologists can understand genetic diversity patterns across scales never before possible.

Since it’s beginning in the 1920s, the field of population genetics has had a data problem. The field’s purpose has always been to study the changes and flow of genes across time, location, and taxa, and to identify the responsible evolutionary forces, although doing so has been problematic up until recently thanks to the advent of big data.

In the early years, founders Wright, Haldane, and Fischer had no way to analyze actual genes on a molecular level, meaning their work was purely theoretical. For example, Fischer’s 1930 The Genetical Theory of Natural Selection, one of the field’s seminal works, successfully combined Mendelian genetics and Darwin’s theory of natural selection—a major stepping stone—although it didn’t offer any real-world data. It wasn’t until decades later that the field evolved due to advancements like when Watson and Crick used X-ray Crystallography to discern DNA’s double helix structure and its molecular interactions in 1953. This opened the door for the invention of various molecular analysis techniques, rocketing the field forward, as data could be collected and analyzed on the genes themselves.

Since then, data collection methods have continued to evolve, leading to a vast trove of information. Today, the field has a new problem: there’s too much information. So much data has been collected on gene markers, individual animals, groups, subgroups, entire species, and their many habitats that traditional data analysis techniques can’t be used. The field has now turned to big data, which loosely encompasses many new data analysis techniques to handle the 3 V’s: volume, velocity, and variety. Doing so has allowed the field to study the changes and flow of genes on a larger scale than ever before, revealing new and surprising results.

Solving One of the Oldest Riddles

Theory predicts that species with high birth rates and low parental involvement (r-selected) should have higher genetic diversity than species with low birth rates and high parental involvement (K-selected). Insects, for example, are an r-selected species, whereas humans are a K-selected species. In theory, this idea makes sense, as the more offspring a species can produce, the wider its genes can spread, mutate, and evolve. On the other hand, species like humans have a genetic diversity that “is substantially lower than that of many other species, including our nearest evolutionary relative, the chimpanzee.” However, researchers in the past have come up short when trying to prove this idea, as it requires crunching an alarming amount of data.

In one of the first examples of applying big data to population genetics, researchers looked at “the genome-wide diversity of 76 non-model animal species by sequencing the transcriptome of two to ten individuals in each species.” These species stretched across a wide range of taxa, adding even more complexity to an extremely large data set. By using big data techniques, they found that genetic diversity was not dependent on geography, invasive status, or many other factors. Instead, they found that genetic diversity “was accurately predicted by key species traits related to parental investment: long-lived or low-fecundity species with brooding ability were genetically less diverse than short-lived or highly fecund ones.”

Not only has big data made solving one of the oldest riddles in population genetics possible, the authors of this study also believe that it will inevitably open the door for an unpredictable amount of other solutions, leading to immediate and long-term effects on conservation policy.

Humans’ Impact on Genetic Diversity

Another study used big data to look at how humans have affected the genetic diversity of other species. It’s widely accepted amongst the scientific community that we are in the middle of the 6th mass extinction. It’s called the Anthropocene, as it is largely being driven by human activities like overfishing, burning fossil fuels, plastic contamination, etc., and it’s characterized by a sharp loss of biodiversity. It’s estimated that 92% of land species and 95% of marine species are at risk of shrinking or disappearing. On a large scale, humans’ impact on other species is already known, but it’s been difficult to asses this impact on individual regions, leaving many open questions.

These questions and many others were answered by a team from the University of Copenhagen with the help of big data. They “georeferenced 92,801 mitochondrial sequences for >4500 species of terrestrial mammals and amphibians.” In other words, they used the DNA found in a particular part of a cell and cross-referenced it with the location of the animal it was taken from. Overall, they found that the closer the species lived to humans, the less genetic diversity it had. They believe their results are detailed enough to hopefully influence the policies and behavior of local governments.

Problems and Limitations

There’s no question that big data will usher in a new era. “This is an extremely exciting time—the field is growing from a dozen or so studies, to hundreds on the horizon,” said Sean Hoban. “It feels as though we’re in a new chapter of scientific advancement.” But not everyone is so excited.

In March 2021, a paper was published in the journal Ecology Letters with 3 warnings for using big data in biology. First: data selection. With so much data to choose from, who is to say which should be included? The authors of this paper believe researchers may consciously or unconsciously choose data to produce a more desired result. Second, data sets may not include all of the data. They claim that data sets are not necessarily complete, but instead may reflect the bias of those compiling the data. That is, only the data that seems relevant at the time might be included. Third, interpreting the data is highly subjective. The authors reexamined the results from previous studies and found that the conclusions were not supported.

However, all of these problems are not new when implementing big data techniques. Other fields experienced the same growing pains, and just like them, biology will grow and evolve.

Support The Happy Neuron by clicking the links below:

Leave a Reply

Your email address will not be published. Required fields are marked *

error

Enjoy this blog? Please spread the word :)

RSS
Share