summary: Researchers are using artificial intelligence (AI) to delve into the mechanisms of gene activation, a critical process in growth, development and disease. The researchers used machine learning to identify ‘synthetic extreme’ DNA sequences that play specific roles in gene activation.
These sequences were discovered by testing millions of DNA sequences and comparing human and Drosophila gene activation elements. This approach can be used to identify synthetic DNA sequences with potentially important applications in biotechnology and medicine.
Important facts:
- Researchers at the University of California, San Diego used machine learning techniques to identify custom-made DPR sequences that are active in humans but not in Drosophila, or vice versa.
- The research team trained a machine learning model using 200,000 established DNA sequences using a method called support vector regression.
- The rare sequences identified by machine learning systems set the stage for the broader use of machine learning and other AI technologies in biology.
sauce: UCSD
Artificial intelligence has exploded across our news feeds, bringing ChatGPT and related AI technologies to the broader public eye. Beyond popular chatbots, biologists are exploring ways to use AI to investigate core gene functions.
Previously, researchers at the University of California, San Diego, who study the DNA sequences that turn on genes, used artificial intelligence to unravel the mysteries linked to gene activation, a fundamental process involved in growth, development, and disease. I identified the pieces of the puzzle.
Professor James T. Kadonaga of the School of Biological Sciences and his colleagues have used machine learning, a form of artificial intelligence, to identify downstream core promoter regions, the “gateway” DNA activation codes responsible for the operation of up to one-third of cells. (DPR) was discovered. our genes.
Building on this finding, Kadonaga and researchers Long Vo ngoc and Torrey E. Rhyne used machine learning to develop a “synthetic extreme” with specifically designed functions in gene activation. identified the DNA sequence.
Publication in magazines genes and developmentResearchers tested millions of different DNA sequences through machine learning (AI) by comparing human and Drosophila DPR gene activators (Drosophila).
Using AI, we were able to find custom-made rare DPR sequences that were active in humans but not in Drosophila, and vice versa.
More generally, this approach can now be used to identify synthetic DNA sequences with activity that could be useful in biotechnology and medicine.
“In the future, this strategy may be exploited to identify synthetic extreme DNA sequences with practical and useful applications. Compare humans (condition X) and Drosophila (condition Y).” Instead, we can test the ability of drug A (condition X) but not drug B (condition Y) to activate the gene,” said Kadonaga, a prominent professor in the department. major in molecular biology.
“This method can also be used to find custom-made DNA sequences that activate genes in tissue 1 (condition X) but not in tissue 2 (condition Y). There are countless examples of applications.
“Synthetized extreme DNA sequences are very rare, perhaps one in a million chances. If they exist, AI could be used to find them.”
Machine learning is a branch of AI in which computer systems continuously improve and learn based on data and experience.
In a new study, Kadonaga, Vo ngoc (former postdoctoral fellow at the University of California, San Diego, now Velia Therapeutics) and Rhyne (staff researcher) used a technique known as support vector regression to analyze the well-established 20 “Trained” a machine learning model on 10,000 DNA sequences. Based on real laboratory experimental data.
These are the targets presented as examples for machine learning systems. He then “fed” 50 million test DNA sequences to human and Drosophila machine learning systems and compared the sequences to ask him to identify unique sequences within the two giant datasets. .
A machine learning system showed that the human and Drosophila sequences are nearly redundant, but the researchers found a rare instance in which gene activation was highly active in humans but not in Drosophila. We focused on the core question of whether AI models can discriminate.
The answer was yes.
Machine learning models have successfully identified human-specific (and Drosophila-specific) DNA sequences. Importantly, the functionality of the AI-predicted extreme sequences was validated in Kadonaga’s lab using conventional (wet lab) testing methods.
“Before undertaking this research, we did not know whether an AI model would be ‘intelligent’ enough to predict the activity of 50 million sequences, especially the ‘extreme’ sequences of outliers with unusual activity. .
“Therefore, it is very impressive and quite remarkable that the AI model was able to predict the activity of a rare extreme sequence of 1 in 1 million,” Kadonaga said, adding that similar 100 million Experiments that machine learning technology analyzed because each wet lab experiment would take nearly three weeks to complete, adding that it was inherently impossible to conduct the subject wet lab.
The rare sequences identified by machine learning systems serve as successful demonstrations and set the stage for other uses of machine learning and other AI technologies in biology.
“In everyday life, people are finding new applications for AI tools such as ChatGPT. Here, we demonstrated the use of AI for designing customized DNA elements in gene activation.
“This method should be practical in biotechnology and biomedical research,” Kadonaga said.
“More broadly, biologists are probably just beginning to harness the power of AI technology.”
About this artificial intelligence and genetics research news
author: Mario Aguilera
sauce: UCSD
contact: Mario Aguilera – UCSD
image: Image credited to Neuroscience News
Original research: closed access.
“Analysis of the Drosophila and human DPR elements reveals distinct human variants that can be enhanced in specificity by machine learning.” James T. Kadonaga et al. genes and development
overview
Analysis of Drosophila and human DPR elements reveals distinct human variants that can be enhanced in specificity by machine learning
The RNA polymerase II core promoter is the site of convergence of signals leading to initiation of transcription. Here, we performed a comparative analysis of the downstream core promoter region (DPR). Drosophila Use machine learning to enable human connection.
These studies have revealed different human-specific versions of the DPR, leading to the use of machine learning models to identify synthetic extreme DPR motifs with specificity for human transcription factors. Drosophila Factors and vice versa.
More generally, machine learning models can similarly be used to design synthetic DNA elements with customized functional properties.
