Using machine learning to map 3D super-enhancers and pinpoint regulators of cell identity

Machine Learning


(Memphis, TN – March 9, 2026)) Scientists typically study the molecular mechanisms that control gene expression from the perspective of a linear, two-dimensional genome, even though DNA and its bound proteins function in three dimensions (3D). To better understand how key components of this machine, such as super-enhancers, control genes in this 3D reality, scientists at St. Jude Children’s Research Hospital developed a new algorithm called . bouquet. BOUQUET used machine learning to reveal that a set of genes and their regulatory elements can interact within protein condensates, or dense membraneless droplets, within the cell nucleus. The findings provide new insights into how cells regulate the genes that control their specialized identity, and are being published today. Nucleic acid research.

Cells express specific sets of genes to perform specific functions. For example, blood cells and brain cells express different context-specific genes. Human DNA has 3 billion base pairs, and genes involved in cell identity are scattered throughout. To complicate matters further, enhancers, the DNA elements that activate gene expression, can be thousands of DNA bases away from the target gene. Scientists led by Dr. Brian Abraham of St. Jude’s Department of Computational Biology recognized that the problem was finding the complete set of enhancers and associated proteins associated with the expression of each gene over such long distances. To address this issue, they created BOUQUET, which considers 3D enhancer architectures in a machine learning-based graph theory framework. Using this approach, researchers can identify which genes are present within transcriptional protein condensates.

“BOUQUET allows us to quantify the activated protein apparatus associated with each gene,” said Abraham, corresponding author of the study. “This assignment provides two major advances: predicting gene expression from protein binding maps and discovering which genes are likely to interact with transcriptional condensates.”

Cell ID Mapping Controller

Enhancers activate gene expression by binding to specific proteins and making contact with target genes. Abraham’s previous research observed that a set of enhancers, called “super-enhancers,” are located in linear proximity to genes encoding proteins with major roles in cell identity, such as regulators of differentiation and those that enable cells to perform identity-specific tasks.

“The idea that super-enhancers, linear groups of enhancers, play a major role in controlling cell identity has helped scientists understand many disease processes, but it has long been known that enhancers function in 3D, so we sought to merge these two concepts,” added co-first author Kelsey Maher, Ph.D., from the Department of Computational Biology. “The data measuring these 3D interactions is complex and noisy, so we needed to use more sophisticated methods to find groups of enhancers and their target genes. So we turned to graph theory and machine learning to capture the context of the entire network and learn the enhancer community.”

While other researchers have successfully grouped enhancers together, the Abraham lab went a step further by incorporating protein binding maps. “It has been assumed that the amount of an activating protein that can associate with a particular gene should be linked to the expression of that gene, but it is difficult to find such a correlation without knowing which genomic regions are important for the expression of each gene,” Abraham added. To their knowledge, his team is the first to show that enhancer-protein binding patterns are indeed quantitatively correlated with gene expression.

Transcription condensate of multiple genes

Scientists at the Abraham Institute called groupings of enhancers communities. “This data argues that communities are the fundamental unit of gene regulation because each part of the community exhibits correlated activities and perturbations applied to one part of the community affect the entire community,” said co-first author Jie Lu, Ph.D., from the Department of Computational Biology.

Each community has different levels of associated proteins. The community containing the most proteins was called “3D superenhancer” to reflect its relationship with linear superenhancers. The results showed that all genes previously found to interact with transcriptional condensates were located within 3D super-enhancers, and the number of these protein-enriched communities matched the number of previous transcriptional condensates.

“We thought that since both 3D superenhancers contain many proteins, they might somehow bind to the condensate,” Lu added. “Not only did we predict and confirm a new condensate-associated gene, but we also observed two genes that shared the same condensate and were cotranscribed within it.” These genes, which come from the same community, are separated by 500,000 base pairs and were exposed to the same biochemical and transcriptional environment at the same time.

“All of our research here aims to understand the mechanisms that control cell identity through regulation of transcription,” Lu continued. Transcriptional dysregulation is central to the identity of malignant cells, so understanding how this dysregulation occurs is of paramount importance. “When a disease-causing gene is aberrantly expressed, it is important to know whether it is regulated by a specific protein or a specific protein ensemble,” Abraham says. “Now we have clues into a multifaceted field to ask whether condensates may control disease gene expression.”

Authors and funders

Other authors of the study are Li Dong, Virginia Valentine, Seth Staller, Alaguraj Veluchamy, and Li.

Tian, ​​Yuna Kim, Bensheng Zhu, Marcus Valentine, John Easton, Stanley Pounds, Stephen

Burden from St. Jude.

This research was supported by a grant from the St. Jude Children’s Research Transcription Collaborative.

the hospital and St. Jude’s fundraising and awareness organization, American Lebanese and Syrian Charities (ALSAC).

St. Jude Media Contact

michael sheffield

Desk: (901) 595-0221

Cell phone: (901) 379-6072

michael.sheffield@stjude.org

media@stjude.org

St. Jude Children’s Research Hospital

St. Jude Children’s Research Hospital is leading the way in the world to understand, treat and cure devastating childhood illnesses. From cancer to life-threatening blood, neurological and infectious diseases, St. Jude is committed to advancing treatments and prevention tools through groundbreaking research and compassionate care. Through global collaboration and innovative science, St. Jude works to ensure that all children everywhere have the best chance for a healthy future. Visit stjude.org to read more. St. Jude Progress, Buy our digital magazine and follow St. Jude on social media. @stjuderesearch.





Source link