AI and large DNA libraries speed up genetic circuit design

Machine Learning


Scientists have developed a new technique that could change the way DNA is designed for therapeutic and biotechnological applications.

This has always been the challenge in synthetic biology. Scientists can program cells to behave a certain way, but identifying the right DNA sequences to make them behave that way is very difficult.

“There are many possible designs for a particular function, and finding the right design is like looking for a needle in a haystack,” said Caleb Bashour, a scientist at Rice University.

Now, Rice University researchers have reported a solution that massively scales a critical part of the design process. The research leverages machine learning and a large library of DNA designs to better predict which sequences will make cells behave the way scientists want them to.

Significantly enhance genetic design

The new technology is called “CLASSIC,” an acronym for “Combining Long-Range and Short-Range Sequencing to Explore Genetic Complexity.” CLASSIC allows scientists to create hundreds of thousands to millions of DNA designs at once, much more than ever before.

“We have developed new techniques that allow us to design hundreds of thousands to millions of DNA at once, more than ever before,” Bashour said.

Bashor is an assistant professor of bioengineering and biological sciences and associate director of the Rice Synthetic Biology Institute at Rice University.

The goal is to map DNA sequences, also known as genetic circuits, to the behaviors that occur within cells. To do so, the team built a large circuit library and connected each circuit to how it functions inside human cells.

The key to CLASSIC’s success is the combination of two sequencing approaches. Long-read sequencing reads thousands of bases at a time and captures the complete circuit design. Short read sequences are faster and more accurate over short ranges.

“Most people use one or the other, but we found that using both together unlocks the ability to build and test libraries,” said co-lead author Ronan OConnell.

The researchers used these approaches to tag and track every circuit. They inserted a circuit into human fetal kidney cells that was engineered to light up when a specific gene was activated. Cells that shine brighter indicate stronger activity.

A short read sequence then identified the barcode for each circuit design within each cell group. This allowed the team to link DNA sequences to results and build large datasets.

These datasets can be used to train machine learning models to understand which DNA designs are most likely to produce desired results. This allows teams to predict the performance of designs that they have not yet physically tested.

“We use this data to train a model that can understand this situation and predict when we couldn’t generate data,” O’Connell said.

Initial validation showed high accuracy. When we compared our predictions to manually tested sequences, all 40 were a perfect match.

AI design is the future

The scale of the data generated by CLASSIC is key. This provides the machine learning model with enough information to make reliable predictions. Without that scale, the model is not accurate enough.

“This is the first time we have been able to use AI ML to analyze circuits and make accurate predictions for untested circuits, because up until this point, no one had been able to build a library as large as ours,” said co-lead author Kshitij Rai.

The researchers also found that circuits often have many working solutions, rather than one optimal design. This flexibility could help engineers design more robust biological systems.

The researchers say this combination of high-throughput data and AI modeling has the potential to accelerate the development of cell-based therapies and other synthetic biology applications. The study was published in the journal nature.



Source link