A unified model explaining the sequence basis of transcription initiation in the human genome. Puffin predicts transcription initiation signals by first detecting sequence patterns that appear in a DNA sequence and then applying the effects of all sequence patterns on the transcription initiation signal. The model includes his three types of sequence patterns: motif, initiator, and trinucleotide. Strand-specific base pair resolution transcription initiation signals are predicted by additively combining the motif effects on a logarithmic scale and then converting them to the output scale. bp, base pair; credit: science (2024). DOI: 10.1126/science.adj0116
× close
A unified model explaining the sequence basis of transcription initiation in the human genome. Puffin predicts transcription initiation signals by first detecting sequence patterns that appear in a DNA sequence and then applying the effects of all sequence patterns on the transcription initiation signal. The model includes his three types of sequence patterns: motif, initiator, and trinucleotide. Strand-specific base pair resolution transcription initiation signals are predicted by additively combining the motif effects on a logarithmic scale and then converting them to the output scale. bp, base pair; credit: science (2024). DOI: 10.1126/science.adj0116
A team led by researchers at UT Southwestern Medical Center has developed a deep learning model to identify a set of simple rules that control the activity of promoters, the regions of DNA that initiate the process by which genes produce proteins. .
Their findings are: sciencecould lead to a better understanding of how promoters contribute to gene regulation in health and disease.
“Promoters are essential to the function of any gene, but our understanding of how these genetic elements function is incomplete despite decades of research that have defined many of their characteristics. Our study sheds new light on how these sequences function in humans and other mammals,” said Dr. Jian Zhou, assistant professor in the Lyda Hill Department of Bioinformatics at UT Southwestern.
Dr. Chou co-led the study with first author Ksenia Dudnik, a graduate student in the Chou lab, and Dr. Jiang Xu, a former research associate at the University of Texas Southwestern Children's Medical Center Research Institute.
The creation of the proteins that cells use to carry out their activities begins with a process known as transcription. This is when the RNA polymerase protein latches onto the DNA strand and copies (or transcribes) the encoded information into an RNA molecule. The region where RNA polymerase binds and initiates transcription is called the promoter.
In humans, promoters are typically made up of hundreds of base pairs, the building blocks of DNA. Researchers have identified common base-pair sequences shared between some regions of DNA that are promoters, but these sequences are often absent in human promoters, and it is unclear how the DNA sequences The rules that direct the transcription process remain unclear.
To better define human promoters and how they work, researchers developed a machine learning program they named Puffin. After analyzing data from tens of thousands of recognized human promoters, the program determined that they were composed of three types of sequence patterns: motifs, initiators, and trinucleotides.
Puffins showed that depending on how these elements are arranged, gene transcription can be activated or repressed. Puffin also predicted how the arrangement of these elements would cause her RNA polymerase to preferentially transcribe a single strand of DNA, or both strands simultaneously in opposite directions. can. This bidirectional transcription is common in human genes.
The program also showed that mice and other mammals share a similar set of rules for controlling promoter operation. Additionally, Puffin allows researchers to predict whether and how transcription will occur if they mutate a promoter, and the results are generally consistent with experimental results. Ta.
The study authors say puffin may help us understand how promoters function in healthy cells and how disease-related promoter changes cause changes in gene transcription. suggested.
This program is available on a free web server (tss.zhoulab.io), allowing other researchers to test promoter sequences of interest. They added that using similar machine learning approaches could provide insight into other aspects of the genome that are still poorly understood.
For more information:
Kseniia Dudnyk et al, Sequence basis of transcription initiation in the human genome, science (2024). DOI: 10.1126/science.adj0116
Magazine information:
science
