Machine Learning Method Developed by CMU Researchers Reveals Fundamental Aspects of Evolution – News

Machine Learning


A team of researchers from the Department of Computational Biology at Carnegie Mellon University(opens in new window) (CBD) have developed a new method to identify parts of the genome that are important for understanding how specific traits in species have evolved.

Science cover featuring the story of Zoonomia

(opens in new window)

This study was published in the journal Science(opens in new window) Led by Assistant Professor Andreas Fenning of the Computer Science Department(opens in new window)Contributing to the Zoonomia Project(opens in new window), an effort to sequence the whole genomes of 240 mammalian species to uncover fundamental aspects of genes and traits that have important implications for protecting human health and conserving biodiversity. Understanding these new large datasets requires the latest artificial intelligence (AI) and machine learning (ML) technologies.

A specific portion of the genome, known as coding DNA, provides instructions for producing proteins, which are essential regulators of cell function. Over time, subtle differences in the instructions that the coding DNA gives to protein production have become one of the driving forces of evolution.

However, the DNA fragments that make up these proteins make up only 1% of the 3 billion nucleotide pairs that make up the human genome. Other noncoding DNA regions, known as enhancers, determine when and where specific genes are activated. To learn more about how these areas work, the CMU team created an ML approach called the Tissue-Aware Conservation Inference Toolkit (TACIT). Whereas conventional evolutionary models may demonstrate changes in a species’ brain size through a series of mutations in gene clusters, enhancers may simply turn genes on or off to achieve the same results. .

Most studies of mammalian evolution focus on parts of the genome that have changed relatively little over millions of years. These conserved regions, especially genes, provide insight into the fundamental elements of mammalian DNA that underscore the unique characteristics of individual species.

The challenge for Fenning and his team is that over time, DNA enhancer regions may change in sequence, but not in function.For example, the well-studied pancreatic islet enhancer(opens in new window) Despite over 700 million years of evolution, humans, mice, zebrafish and sponges regulate gene levels in similar patterns. This makes them much more difficult to identify and track using traditional methods of examining individual nucleotides.

TACIT addresses this issue by accurately predicting whether an enhancer is activated in a particular cell type or tissue. This will allow scientists to identify these important enhancer regions within newly sequenced genomes without conducting new laboratory experiments, enabling potential applications in conservation biology. increase. This toolkit can predict how enhancers will perform in endangered or endangered species where controlled laboratory experiments are not possible.

“TACIT provides an unprecedented opportunity to predict the function of non-genetic parts of the genome in species for which primary tissue samples are unavailable, such as bottlenose dolphins and the endangered black rhinoceros,” said Eileen Kaprow, lead author of the paper. said Mr. She is the author of this paper and is also a CBD Postdoctoral Fellow and Lane Fellow. “As ML techniques and techniques for identifying enhancers from specific cell types improve, we expect to be able to extend the capabilities of TACIT to provide new kinds of insights into mammalian evolution,” she says.

After predicting the function of 240 mammalian genome sequences, the researchers applied TACIT to identify parts of the genome that evolved into larger brains in mammals, and those parts had mutations in the human brain. We found that they tended to be close to the genes involved. size failure. They also identified enhancers associated with social behavior across mammals that are specific to particular neuronal subtypes, parvalbumin-positive inhibitory interneurons.

“We think this is just the tip of the iceberg,” said Fenning, the study’s lead author. “By applying TACIT to a few tissues and a few traits, he found some interesting relationships, but there is still much more to discover.”

In addition to Fenning and Kaprow, lead authors on the paper include Alyssa Lawler, a former PhD in biological sciences. She is currently a student at the Broad Institute. And Daniel Shaffer recently graduated from CBD’s undergraduate program. Schaefer’s co-first authorship of this publication is important evidence of an innovative curriculum for an undergraduate program that focuses on cutting-edge computational techniques and emphasizes hands-on scientific research opportunities.

For more information on the Zoonomia Project, please visit its website.(opens in new window).



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *