DeepMind's alphagenome uses AI to decode non-coding DNA for research.

Applications of AI


DeepMind's alphagenome aims to decode “dark matter” in DNA

This AI system can analyze up to one million DNA characters at once, predicting that small changes in the non-coding region can cause everything from cancer to rare genetic disorders, and could revolutionize personalized medicine.

DNA (deoxyribonucleic acid) sequence and magnifying glass, illustration

Puzzles seem impossible. It takes 3 billion characters of code and predicts what will happen if you exchange one character. The code we are talking about, the human genome, carries most of its instructions in genetic “dark matter.” Alphagenome, an artificial intelligence system just released by Google Deepmind in London, aims to demonstrate how even small changes in these non-coding sections affect gene expression.

DeepMind's newly released technology can change the way genetic diseases are treated. Scientists have long dismissed non-coding DNA as “junk,” but have found that this so-called dark matter controls when and how genes turn on or off. Alphagenomes are promising in predicting how mutations in these regions cause disease, from specific cancers to rare disorders where important proteins are never produced. By revealing these hidden control switches, alphagenomes could help researchers design therapies targeting genetic conditions and help millions of people.

However, understanding the complexity of the tasks in which alphagenomes were created requires consideration of how the definition of “genes” evolved. The term was coined in 1909 to describe an invisible unit of heredity (suggested by Gregor Mendel in 1865). However, by the 1940s, the idea of “one gene, one enzyme” had become established. And by the 1960s, textbooks taught that in order for a set of DNA to be called genes, it was necessary to code a specific protein.


Supporting science journalism

If you enjoy this article, consider supporting award-winning journalism. Subscribe. Purchase a subscription helps ensure a future of impactful stories about discoveries and ideas that will shape our world today.


Over the past 20 years, definitions have spread with the discovery of genes encoding numerous types of RNA. Please don't Translated into protein. Today, genes are thought to be DNA segments in which RNA or protein products perform biological functions. This concept shift highlights the real estate map of the genome. Only about 1-2% of human DNA directly encodes proteins. However, by a broader definition, about 40% are genetic regions.

It is important that it remains unknown. Over a billion units of codes that can determine how and how genes are activated. Deciphering related clues is one of the most challenging challenges in biology as they are far apart and regenerate through complex cycles of gene regulation. The goal of alphagenomes is to understand how these regions affect gene expression, and how even small changes can tilt the overall balance between health and illness. To that end, the AI system uses DNA sequences up to 1 million characters in length as input, according to a statement issued by DeepMind, and uses them as “predict thousands of molecular properties that characterize regulatory activities.”

Alphagenomes have already replicated the results of the Institute of Genetics. In a June 2025 preprint study (not yet peer reviewed), the alphagenome team explained that they used the model to perform simulations that reflected known DNA interactions. Simulating interactions between alphagenomes in a series of DNA containing both genes and mutations predicted the same complex event chain already observed in lab experiments.

Alphagenomes are currently only available for non-profit testing, but the response of the scientific community has so far been enthusiastic, with both biotech startups and university researchers publicly expressing excitement about the potential system to accelerate research.

The limit remains. Alphagenome struggles to capture interactions where more than 100,000 DNA characters are separated, misses tissue-specific nuances, and is not designed to predict traits from the complete individual genome. Complex diseases that depend on development and environment are also outside their direct scope. However, this system suggests widespread use. By tracking instantaneous changes via gene regulation, the roots of genetic disorders can be identified. It is useful in the design of synthetic DNA. And most of all, it could provide a faster way to chart complex regulatory circuits in the genome.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *