Let’s talk about the language in which we are composed. DNA is basically a four-letter alphabet consisting of A, T, C, and G. Researchers at the University of Oregon have discovered a way to read these genetic codes using artificial intelligence. This is very similar to how chatbots read text.
Read DNA with AI


Andrew Kahn and his lab turned to AI to understand mutations in DNA. They took an old machine learning architecture, GPT-2, and trained it on simulations of genetic evolution across a variety of species, including bacteria, rodents, mosquitoes, and primates.
“We can’t repeat evolution, so one of the key workflows we do is develop simulations,” said Kevin Korfmann, lead author of the study. “Simulations mimic the evolutionary process and use the results as training data for deep learning models.”
This tool looks for mutations in the genetic code and traces genes back to the last common ancestor.
“Advances in generative AI and the architectures behind it could be useful in many areas beyond chatbots,” said Evergreen biology professor Andrew Kahn.
In testing, the AI model performed similarly to traditional statistical methods. While traditional math-based methods can take hours or even days to decipher a single mosquito chromosome, the new tool does it in minutes.
“Compared to classical inference approaches, AI tools do not need to reason about every mutation individually,” Kaufman added. “You just read the patterns. All the expensive statistical work is done upfront during training, avoiding bottlenecks.”
“When you borrow technology from a completely different world and apply it to a new problem, you never know what will work,” Kahn says. “But this was a case where things worked out very well.”
Impact on disease management
These kinds of tools can help scientists figure out when a species developed a particular trait or when disease resistance genes emerged. Let’s take malaria as an example. Scientists have been using insecticides to control mosquito populations for years, until mosquitoes developed resistance.
“We are now seeing resistance to insecticides in all of these mosquito populations,” Khan explained.
“A major challenge in preventing the spread of malaria is understanding the evolution of insecticide resistance,” Khan added. “Now, using AI models, we can ask how long ago these resistance genes arose in the population and learn about the evolutionary history of this important malaria carrier.”
Additionally, the model works with incomplete DNA datasets, which is a common problem for researchers. Looking to the future, Kahn and Koffman hope to use machine learning to build complete family trees across multiple lineages.
“There are a lot of things in machine learning that haven’t yet been applied to our field,” Koffman said. “There is a lot of translation work that needs to be done to make these new algorithms work in biology.”
