Google’s new deep learning model can predict the effects of small changes to DNA sequences up to 1 million base pairs long, making it especially good for non-coding DNA, which has proven particularly difficult to understand. An artificial intelligence (AI) tool called AlphaGenome gives researchers a way to better understand the human genome and could help scientists develop treatments for diseases.
AlphaGenome is a “fundamental, high-quality tool that turns the static code of the genome into a readable language.”
Robert Goldstone, Francis Crick Institute
Small changes in the human genome can have large effects on a person’s health and cause genetic diseases such as cystic fibrosis and certain cancers. Most changes occur in the non-coding regions of the genome, which make up 98% of all DNA. Because these regions do not code for proteins but rather affect gene expression, changes often have different biological effects, making their effects difficult to predict.
Developed by Google DeepMind, AlphaGenome can predict the molecular impact of single base pair mutations across DNA sequences up to 1 million base pairs long. It is built on Google’s previous model AlphaMissense. This model could only understand the effects of variation in the coding region of the DNA sequence.

The new model is trained on human and mouse genomic data, takes DNA sequences as input, and predicts various genetic signals associated with specific biological functions. This includes gene expression, the accessibility of DNA to proteins, and where gene splicing occurs.
‘key [benefit] That is, you can introduce mutations into the sequence and change C for example. [base pair] to T and use that model to compare these differences,” said Google DeepMind researcher Žiga Avsec.
What do we mean when we say AI?
Artificial intelligence (AI) is an umbrella term that is often mistakenly used to encompass a variety of connected but simpler processes.
A.I. The ability of a machine or computer program to perform tasks normally only performed by humans, such as reasoning, responding to feedback, and making decisions.
Generation AI is a new variant of AI that analyzes and detects patterns in training datasets and generates original text, images, and videos in response to user requests. ChatGPT, Microsoft Copilot, Google Gemini, and more recently X’s Grok are all examples of chatbots that use generative AI.
neural network It is an interconnected array of artificial neurons, similar to a biological brain, that identifies, analyzes, and learns from statistical patterns in data.
machine learning is a subset of AI that allows machines to learn from datasets and make predictions based on new data without the programmer explicitly asking for it. Machine learning models perform better the more data they receive.
deep learning is an enhanced type of machine learning that uses neural networks with many layers to analyze complex data from very large datasets. Deep learning applications include speech recognition, image generation, and translation.
Large-scale language model or LLM It is a type of deep learning that is trained on large amounts of data to understand and generate language. LLMs learn patterns in text by predicting the next word in a sequence, and these models can now write prose, analyze text from the Internet, and interact with users.
AlphaGenome matched or outperformed other state-of-the-art models in 25 out of 26 tasks predicting the effects of genetic variation. The research team was also able to simulate known DNA mutations that cause certain types of leukemia and predict the same results observed in the lab.
“Previously, the field required separate models for separate tasks,” Avsec said, adding that early models often had a trade-off between sequence length and resolution. “AlphaGenome brings these together under one roof.”
Natasha Latysheva, senior research engineer at DeepMind, explains that AlphaGenome could improve fundamental knowledge about the genome, deepen our understanding of rare diseases and cancers, and help scientists design new DNA sequences to treat specific conditions.
AlphaGenome joins the collection of other AI tools developed by Google DeepMind. This includes the 2024 Nobel Prize-winning AlphaFold, which predicts the 3D shape of proteins. Pushmeet Kohli, who led the study, explains that “the genome is a recipe, and the focus of AlphaGenome is to understand the impact of changing parts of the recipe.”
AlphaGenome turns genetic code into a ‘decipherable discovery language’
Robert Goldstone, head of genomics at Britain’s Francis Crick Institute, believes AlphaGenome is a “fundamental, high-quality tool that turns the static code of the genome into a readable language for discovery,” but warns that it is “not a silver bullet that will solve all biological questions.”
Despite its improvements, AlphaGenome still has many limitations. Like other models, it is difficult to predict the effects of genetic changes more than 100,000 base pairs apart, and it can only predict DNA sequences from the cell types used to train the model: humans and mice.
Another issue is interpreting the results from the model, explains Jian Zhou, a genomics machine learning researcher at the University of Chicago. “Even if models make accurate predictions, they do not necessarily directly inform us about the underlying biological processes,” he added.
Google DeepMind released a preview of its nonprofit research model last June. Since then, nearly 3,000 scientists in 160 countries have used AlphaGenome, sending about 1 million requests every day, Kohli explains.
He expects “AlphaGenome will continue to be a valuable resource to the scientific community, helping scientists better understand genome function and disease biology, and ultimately driving new biological discoveries and…new treatments.”
