Researchers use machine learning to identify ‘synthetic extreme’ DNA sequences

Applications of AI


Artificial intelligence has exploded across our news feeds, bringing ChatGPT and related AI technologies to the broader public eye. Beyond popular chatbots, biologists are exploring ways to use AI to investigate core gene functions.

Previously, researchers at the University of California, San Diego, who study the DNA sequences that turn on genes, used artificial intelligence to unravel the mysteries linked to gene activation, a fundamental process involved in growth, development, and disease. I identified the pieces of the puzzle. Professor James T. Kadonaga of the School of Biological Sciences and his colleagues have used machine learning, a form of artificial intelligence, to identify downstream core promoter regions, the “gateway” DNA activation codes responsible for the operation of up to one-third of cells. (DPR) was discovered. our genes.

Building on this discovery, Kadonaga and researchers Long Vo ngoc and Torrey E. Rhyne used machine learning to generate “synthetic extreme” DNA with specifically designed functions in gene activation. identified the sequence.Publication in magazines genes and developmentResearchers tested millions of different DNA sequences through machine learning (AI) by comparing human and Drosophila DPR gene activators (Drosophila). Using AI, we were able to find custom-made rare DPR sequences that were active in humans but not in Drosophila, and vice versa. More generally, this approach can now be used to identify synthetic DNA sequences with activity that could be useful in biotechnology and medicine.

In the future, it may be possible to use this strategy to identify synthetic extreme DNA sequences with practical and useful applications. Instead of comparing humans (condition X) and Drosophila (condition Y), we can test the ability of drug A (condition X) to activate genes, but not the ability of drug B (condition Y). This method can also be used to find custom-made DNA sequences that activate genes in tissue 1 (condition X) but not in tissue 2 (condition Y). There are countless practical applications of this AI-based approach. Synthesized extreme DNA sequences are extremely rare, perhaps one in a million possible. If it exists, AI will find it. ”

James T. Kadonaga, Professor of Molecular Biology, University of California, San Diego

Machine learning is a branch of AI in which computer systems continuously improve and learn based on data and experience. In a new study, Kadonaga, Vo ngoc (former postdoctoral fellow at the University of California, San Diego, now Velia Therapeutics) and Rhyne (staff researcher) used a technique known as support vector regression to analyze the well-established 20 “Trained” a machine learning model on 10,000 DNA sequences. Based on real laboratory experimental data. These are the targets presented as examples for machine learning systems. We then “fed” 50 million test DNA sequences to human and Drosophila machine learning systems and asked them to compare the sequences and identify unique sequences within the two massive datasets.

A machine learning system showed that the human and Drosophila sequences are nearly redundant, but the researchers found a rare instance in which gene activation was highly active in humans but not in Drosophila. We focused on the core question of whether AI models can discriminate. The answer was yes. Machine learning models have successfully identified human-specific (and Drosophila-specific) DNA sequences. Importantly, the functionality of the AI-predicted extreme sequences was validated in Kadonaga’s lab using conventional (wet lab) testing methods.

“Before embarking on this research, we didn’t know whether our AI model would be ‘intelligent’ enough to predict the activity of 50 million sequences, especially the ‘extreme’ array of outliers with anomalous activity. . AI models have the potential to predict the activity of extreme sequences that are as rare as one in a million,” Kadonaga said, adding that since each wet lab, machine learning technology has analyzed the equivalent 100 million wet labs. He added that it was essentially impossible to conduct the experiment. It will take him nearly three weeks to complete the experiment.

The rare sequences identified by machine learning systems serve as successful demonstrations and set the stage for other uses of machine learning and other AI technologies in biology.

“In everyday life, people are finding new applications for AI tools such as ChatGPT.Here, we demonstrated the use of AI for designing customized DNA elements in gene activation. It should be put to practical use in technology and biomedical research,” said Kadonaga. “More broadly, the biologist is probably just beginning to harness the power of AI technology.”

sauce:

University of California, San Diego

Reference magazines:

Vongok, L., other. (2023) analysis of Drosophila and human DPR elements reveals distinct human variants that can be enhanced in specificity by machine learning. genes and development. doi.org/10.1101/gad.350572.123.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *