Deep learning model developed to help predict the functional impact of regulatory mutations

Machine Learning


AI automation technology with digital artificial intelligence, machine learning, robots, workflow optimization, data analytics, ERP software, RPA, data science, big data analytics, chatbots
Credit: Thitima Uthaiburom / iStock / Getty Images Plus

A research team co-led by scientists from the Netherlands Cancer Institute (NKI) and the OnCode Institute has developed PARM (Promoter Activity Regulation Model), a deep learning model that provides new insights into the regulation of human promoters by transcription factors, and how genes know when to switch on or off.

Researchers say scientists can now begin to use this tool to read these genetic information and generate clues to new cancer diagnoses, patient stratification, and future treatments. They also suggest that the findings show that gene regulation is much more predictable than previously thought.

“We can now actually read the language of gene regulatory systems,” said Bas van Stencel, Ph.D., group leader at the Netherlands Cancer Institute (NKI) and the OnCode Institute, and co-senior author and co-correspondent of a paper the team published in 2016. nature. “Using our PARM model, we are now able to uncover these rules at scale, allowing us to understand and even predict how regulatory DNA controls gene activity.” In a paper titled “Regulatory Grammar of the Human Promoter Revealed by MPRA-Based Deep Learning,” the authors conclude, “Our approach provides a highly economical strategy to better understand the dynamic control of human promoters by transcription factors.”

Promoters are the core regulatory elements of all genes, the researchers wrote. “Their activity ensures the correct transcriptional levels of individual genes, which is essential for cellular homeostasis and response to a wide range of signals,” Van Stensel further explained. “The classical genetic code explains how genes in DNA code for proteins, but we honestly didn’t understand most genes. how They are regulated. We know that the DNA between genes contains regulatory elements such as promoters. However, this language control system It determines whether a gene is turned on or off, in which cells, and how strongly. ”

At the same time, most cancer-associated mutations are located in non-coding parts of the genome, and until now it has been very difficult to interpret such mutations. “Building computational models that can predict promoter activity from DNA sequences is difficult,” the scientists noted.

The development of PARM was born out of a bold mission to decipher the genome’s operating system and involved seven research groups collaborating with the Oncode Institute’s PERICODE project. PARM development work involved a combination of laboratory experiments and calculations.

Hatice Yücel and Max Trauernicht from the Bas van Steensel research group at the Netherlands Cancer Institute, where the technology underlying the new AI model PARM was developed. [©Netherlands Cancer Institute / Sanne Hijlkema]
Hatice Yücel and Max Trauernicht from the Bas van Steensel research group at the Netherlands Cancer Institute, where the technology underlying the new AI model PARM was developed. [©Netherlands Cancer Institute / Sanne Hijlkema]

The researchers describe the PARM model as “…a cell type-specific deep learning model trained on a specially designed massively parallel reporter assay (MPRA) that queries human promoter sequences.” MPRA technology, developed in NKI’s van Steensel lab, allows researchers to measure gene regulation on an unprecedented scale. But data alone doesn’t always provide insight, which is why scientists from Dr. Jeroen de Ridder’s lab at UMC Utrecht and the Oncord Institute took on the challenge.

Using large amounts of data specific to gene regulation, we trained an AI model that truly captures the biological rules underlying gene activation. “…we present a platform that combines optimized MPRA and deep learning to efficiently build sequence-activity models of all human promoters,” the authors state. De Ridder added, “Most AI models learn from whatever data happens to be present. Here, measurements and AI were designed together. This allowed us to create hyperefficient models for specific cell types that can be applied at a previously unthinkable scale.”

The new model allows the researchers to predict how gene regulation differs between cell types and how it changes when cells are exposed to stimuli such as certain drugs. Additionally, the model revealed in great detail what the structure of each gene’s “on and off button” is. “We leveraged PARM to systematically identify binding sites for transcription factors that likely contribute to the activity of native human promoters and detect the rewiring of these regulatory interactions after different stimuli to cells,” the researchers explained. Importantly, the team didn’t stop at predictions. Rigorous experimental tests were performed on all model outputs to ensure that these predictions are indeed correct.

This screenshot of the PARM model shows one of the genes described in the Nature paper (APOC2). Having several DNA letters standing next to each other usually means that transcription factors bind there and activate genes. [©Netherlands Cancer Institute]
This screenshot of the PARM model shows one of the genes described in the Nature paper (APOC2). Having several DNA letters standing next to each other usually means that transcription factors bind there and activate genes. [©Netherlands Cancer Institute]

Despite significant advances in this field, existing AI models may be too heavy-handed to apply to the vast number of mutations that exist, or may be too general and do not adequately capture the diversity of cell types. The PARM model changes that. This allows researchers to predict the functional impact of regulatory mutations in specific cell types under specific conditions, such as drug treatment, opening new avenues for cancer diagnosis, patient stratification, and future treatments.

In a newly published paper, van Steensel, de Ridder and colleagues say, “With this platform, named PARM, both data generation and computational modeling are very economical. With this development, we were able to build sequence-activity models of all human promoters in 10 different cell types and after exposing the cells to several stimuli.”

Van Steensel mentioned Google DeepMind, which recently announced details. nature We describe the AlphaGenome model, which aims to understand gene regulation. “This is a great model,” Van Stensel said. “However, PARM is more flexible and experimentally and computationally lightweight. This tool requires about 1000 times less computational power than AlphaGenome, making it far more feasible for academic researchers around the world. Using this model, we only need one dish of cells and one day of calculations to see in detail how a particular cell type, such as a tumor cell, uses its DNA code to respond to signals such as hormones, nutrients, and drugs.”

In a newly published paper, van Steensel et al. conclude that “PARM complements other deep learning approaches for modeling enhancer element grammars and designing artificial promoters, demonstrating that lightweight models trained on small functional genomic datasets are a viable and powerful alternative to large-scale modeling efforts.”





Source link