aArtificial intelligence algorithms have had a meteoric impact on protein structure, such as when DeepMind’s AlphaFold2 predicted the structures of 200 million proteins. Now, David Baker and a team of biochemists at the University of Washington have taken protein folding AI one step further.and Nature In their February 22 publication, they outlined how AI can be used to design bespoke, functional proteins that can be synthesized and produced within living cells, creating new opportunities for protein engineering. Ali Madani, founder and CEO of Profluent, a company that uses other AI technologies to design proteins, said the research “has come a long way” in protein design, and now we’re looking into “new frontiers.” We are witnessing the rapid growth of
Proteins are composed of different combinations of amino acids linked in folded chains, giving rise to an infinite variety of 3D shapes. Because of the many factors that govern protein folding, such as amino acid sequence and length in biomolecules, interactions with other molecules, and sugars, predicting the 3D structure of a protein based on sequence alone is beyond the reach of the human mind. is not possible. added to its surface. Instead, for decades, scientists have used experimental techniques such as X-ray crystallography, which can resolve protein folding to atomic detail by diffracting X-rays through a crystallized protein. have determined the structure of proteins. However, such methods are expensive, time consuming, and dependent on skilled execution. Still, scientists using these techniques have successfully solved the structures of thousands of proteins, creating a wealth of data that can be used to train AI algorithms to determine the structure of other proteins. Did. DeepMind used the AlphaFold system to demonstrate that machine learning can predict protein structures from amino acid sequences, and famously improved accuracy by training his AlphaFold2 on 170,000 protein structures. .
See “DeepMind AI reduces protein structure determination time.”
On the same day that the AlphaFold2 paper was published, Baker and his colleagues released an independent and freely accessible alternative, known as RoseTTAFold, that predicts protein structure with similar accuracy to AlphaFold2. bottom.
Since then, Baker and his team have investigated whether machine learning can be used in reverse to generate amino acid sequences for imaginary proteins with industrial or medical potential. Protein engineering is primarily used in experiments to study the effects of making incremental changes to proteins, such as introducing random mutations into genes that express proteins of interest and screening the resulting proteins for desirable indications. depends. With AI, Baker says, “we can design faster and better than ever” for such proteins.
To test their protein design strategy, they turned to a group of light-producing enzymes called luciferases.Lucifer Latin for “bearer of light”). These enzymes, when bound to small molecules called luciferins, glow in the dark and are found in many organisms, including fireflies and aquatic life in the pitch-black deep sea.
See “Fish steal bioluminescence from prey”
Unlike fluorescent proteins, luciferases do not require an excitation light source and have useful applications for deep imaging inside animal tissues. However, very few luciferases are found in nature. Most are labile and tend to bind natural luciferin better than synthetic luciferins that have been engineered to have favorable properties. These factors have hampered efforts to use luciferases for scientific applications or to engineer engineered versions of these enzymes.
Using a mixture of AI systems such as AlphaFold2, Protein MPNN, and trRosetta, researchers set out to invent amino acid sequences for luciferase that can bind and maintain stability to synthetic luciferins. Because natural luciferases bind poorly to synthetic luciferins, they used machine learning to predict how well 4,000 other proteins known to bind small molecules would accumulate in comparison. bottom. One protein group stood out. It is a superfamily of nuclear transport factor 2 (NTF2)-like proteins. This algorithm revealed that members of this superfamily share a pocket that can hold synthetic luciferins. With a structure that can bind to synthetic luciferin, the team focused on stability. Unfortunately, NTF2-like proteins contain long loops of amino acids that can be prone to misfolding in synthetic hybrid proteins. However, the loops are not essential for luciferase activity, so the researchers used machine learning algorithms to replace the loops with other, more stable combinations of amino acids.
Ultimately, by combining AI techniques, the team was able to create 7,648 custom designs of proteins that don’t exist in nature but might be able to do what the researchers wanted. The researchers then had to narrow them down to the best few by determining which cells in the synthetic luciferin-treated cells emit light. Researchers implemented these designs in each. Escherichia coli When we detected bacteria, we found that only 3 designs (0.04%) worked.
See “Most human protein structure predictions now available for free”
Enzyme design is an extremely difficult task, requiring extreme precision to make it work, says Madani, and “any success is very impressive.” At Profluent, Madani is working on ProGen, a separate AI workflow for protein design, which he says has a “50%+ hit rate.” But comparing these approaches is like comparing apples and oranges, as they are ideal for customizing different kinds of proteins, he added.
Determined to optimize their workflow, the team applied their first-time knowledge to design other luciferases against different synthetic luciferins of different shapes, with yields up to 4% for all 46 putative designs. increased. Andy Shenwei Yeh, a postdoctoral researcher in the Baker lab, said the first round would help understand “what geometric shapes yield luciferase,” and what the algorithm should take into account. said it helped narrow down the number of candidate sequences. Now Baker and Yeh have spun out a diagnostic biosensor company called Monod Bio, which has licensed synthetic luciferase.
Protein design is not yet fully automated. “There’s still room for improvement,” says Baker, as manual sequence modifications are still required to complete the active site of the luciferase enzyme. But he hopes one day his AI will be able to synthesize proteins “soon.” The luciferase enzymatic reaction is also relatively easy to mimic, he said, adding, “There remains work to be done to see how well this approach works for more difficult chemistries.” I will.”
Going forward, Baker and his team are developing another AI system called RFdiffusion We plan to streamline protein design and use it to invent synthetic proteins for nasal sprays that block the influenza virus from attaching to host cells. The algorithm is expected to produce a highly stable protein, so Baker hopes the nasal spray will have a long shelf life and can be used routinely during the winter to ward off infections. Baker said the algorithm not only blocks respiratory viruses, but could be used in the future to design new biomaterials, stable plastic-degrading enzymes, and proteins that capture solar energy. ing.