Generative AI imagines new protein structures | Massachusetts Institute of Technology News

Applications of AI


Biology is a wonderfully delicate tapestry. At its core is DNA, the primary weaver that encodes proteins and is responsible for coordinating many biological functions that sustain life in the human body. However, our bodies are like finely tuned musical instruments and can easily get out of tune. After all, we face an ever-changing and unforgiving natural world of pathogens, viruses, diseases and cancers.

Imagine if we could speed up the process of creating vaccines and medicines against emerging pathogens. What if we had a gene-editing technology that could automatically generate proteins that would correct DNA errors that cause cancer? Essential for medicine, diagnostics, and numerous industrial applications, it is often a long-term and costly endeavor.

To advance the power of protein engineering, MIT CSAIL researchers have devised “FrameDiff,” a computational tool for creating new protein structures beyond what nature has created. Machine learning approaches generate ‘frames’ tailored to the intrinsic properties of protein structures, allowing new proteins to be built independent of existing designs, facilitating the construction of unprecedented protein structures.

“In nature, protein design is a slow-burning process that takes millions of years. ,” said lead MIT CSAIL PhD student Jason Yim. “This objective opens up a myriad of enhanced capabilities, such as better binding agents, for this new ability to generate synthetic protein structures. This opens up a myriad of enhanced capabilities, such as better binding agents, that can engineer proteins that can more efficiently bind to other molecules.” It also offers possibilities such as the development of more efficient photosynthetic proteins, the development of more photosynthetic proteins, etc., which may lead to the development of better biosensors, targeted drug delivery and It is selective with wide-ranging implications related to biotechnology. Engineered effective antibodies and nanoparticles for gene therapy. ”

Framing FrameDiff

Proteins have complex structures, made up of many atoms held together by chemical bonds. The most important atoms that determine the 3D shape of a protein are called the “backbone” and are like the spine of the protein. All triplets of atoms along the backbone share the same bond pattern and atom type. Researchers realized that this pattern could be exploited to build machine learning algorithms using ideas from differential geometry and probability. This is where frames come into play. Mathematically, these triplets can be modeled as rigid bodies called “frames” (common in physics) that have a 3D position and rotation.

These frames provide each triplet with enough information to know about its spatial environment. The challenge then becomes to learn how the machine learning algorithm moves through each frame to build the protein backbone. By learning how to build existing proteins, it is expected that algorithms will be generalized and will be able to create new proteins never seen before in nature.

Training a model that builds proteins via “diffusion” involves randomly moving every frame and injecting noise that blurs the appearance of the original protein. The algorithm’s job is to move and rotate each frame until it looks like the original protein. Although simple, the development of diffusion over frames requires techniques for probabilistic computation over Riemannian manifolds. In theory, the researchers developed “SE(3) diffusion” to learn probability distributions that trivially connect the translational and rotational components of each frame.

The delicate art of diffusion

In 2021, DeepMind introduced AlphaFold2, a deep learning algorithm for predicting protein 3D structures from sequences. When creating a synthetic protein, he has two critical steps: generation and prediction. Generation means creating a new protein structure or sequence, and “prediction” means understanding what the 3D structure of the sequence looks like. It’s no coincidence that AlphaFold2 used frames to model proteins. SE(3) Diffusion and FrameDiff were inspired to further develop the concept of frames by incorporating them into the diffusion model. This is a very popular generative AI technique for image generation, for example Midjourney.

Shared frames and principles between protein structure generation and prediction mean that the best models on both ends are compatible. In collaboration with the Protein Design Laboratory at the University of Washington, SE(3) diffusion is already being used for novel protein creation and experimental validation. Specifically, they combined his SE(3) diffusion with RosettaFold2, a protein structure prediction tool very similar to AlphaFold2, to produce ‘RFdiffusion’. This new tool enables protein designers to develop biotechnologies such as developing highly specific protein binders to accelerate vaccine design, engineering symmetric proteins for gene delivery, and robust motif scaffolds for precise enzyme design. We were one step closer to solving an important problem in technology.

FrameDiff’s future work includes increasing its versatility for problems that combine multiple requirements for biologics such as pharmaceuticals. Another extension is to generalize the model to all biological modalities, including DNA and small molecules. By extending FrameDiff’s training to more substantial data and enhancing its optimization process, the team could produce a substructure with the same design capabilities as RFdifffusion while maintaining FrameDiff’s inherent simplicity. claims to have sex.

“Discarding pre-trained structural prediction models” [in FrameDiff] Harvard computational biologist Sergei Ovchinnikov said: The researchers’ innovative approach provides a promising step towards overcoming the limitations of current structural prediction models. It’s still preliminary, but it’s an encouraging step in the right direction. Thus, the vision of protein design, which plays a pivotal role in addressing humanity’s most pressing challenges, is increasingly within reach thanks to this pioneering work of his MIT research team. It seems that. ”

Lim is a postdoctoral fellow at Columbia University, Brian Tripp, a researcher at the Paris Data Science Center at the French National Center for Scientific Research, Valentin de Bortoli, a postdoctoral fellow at the University of Cambridge, Emile Mathieu, and a professor of statistics at the University of Oxford, Deepmind. I co-authored the paper with Arnaud Doucet, a senior researcher at the company. . MIT professors Regina Barzilay and Tommi Jaakkola provided guidance on the research.

The team’s research was supported in part by the MIT Abdul Latif Jameel Clinic for Machine Learning in Health, an EPSRC grant, Microsoft Research’s prosperity partnership with the University of Cambridge, the National Science Foundation Graduate Research Fellowship Program, an NSF Expeditions grant, and machine learning. I was. We provide grants to the Pharmaceutical Discovery and Synthesis Consortium, the DTRA Discovery of Medical Countermeasures Against New and Emerging Threats Program, the DARPA Accelerated Molecular Discovery Program, and the Sanofi Computational Antibody Design Grant. The research will be presented at an international conference on machine learning in July.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *