AI-powered protein design tools for biologists

AI News


It has already been proven that artificial intelligence can accelerate new drug development and improve our understanding of diseases. But turning AI into new treatments requires getting the latest and most powerful models into the hands of scientists.

The problem is that most scientists are not experts in machine learning. Today, OpenProtein.AI helps scientists stay on the cutting edge of AI with a no-code platform that provides access to a powerful foundational model and set of tools for protein design, protein structure and function prediction, and model training.

Founded by Tristan Bepler, PhD ’20, and Tim Lu, PhD, former MIT associate professor in 2007, the company already provides tools, including fundamental models for protein engineering developed in-house, to researchers at pharmaceutical and biotech companies of all sizes. OpenProtein.AI also makes its platform free to scientists in academia.

“This is a really exciting time, because these models can not only make protein engineering more efficient and shorten the development cycle for therapeutics and industrial applications, but also increase our ability to design new proteins with specific properties,” Bepler says. “We’re also thinking about applying these approaches to modalities other than proteins. The big picture is that we’re creating a language to describe biological systems.”

AI advances in biology

Bepler came to MIT in 2014 as part of the Computational and Systems Biology PhD program, studying under Bonnie Berger, MIT’s Simmons Professor of Applied Mathematics. There he realized how little we understood about the molecules that make up the building blocks of biology.

“We hadn’t characterized biomolecules and proteins well enough to be able to create good predictive models of, for example, what whole genomic circuits do or how protein interaction networks operate,” Bepler recalls. “I became interested in understanding proteins at a more detailed level.”

Bepler began exploring ways to predict the chains of amino acids that make up proteins by analyzing evolutionary data. This was before Google released AlphaFold, a powerful predictive model for protein structure. This research led to one of the first generative AI models for understanding and designing proteins, or what the team calls a protein language model.

“We were really excited about the classical framework of proteins and the relationships between their sequence, structure, and function, which we don’t really understand,” Bepler says. “So how can we use these foundational models to skip the “structural” component and go directly from sequence to function?”

After receiving his Ph.D. in 2020, Bepler joined Lu’s lab at MIT’s Department of Bioengineering as a postdoctoral researcher.

“The idea of ​​integrating AI with biology was just starting to pick up steam,” Lu recalls. “Tristan helped us build better computational models for biological design. He also realized that there was a disconnect between the cutting-edge tools available and the biologists who wanted to use them but didn’t know how to code. OpenProtein was born from the idea of ​​expanding access to these tools.”

Bepler worked on the front lines of AI as part of his doctoral studies. He knew this technology could help scientists accelerate their research.

“We started with the idea of ​​building a general-purpose platform for doing machine learning-in-the-loop protein engineering,” Bepler says. “Machine learning ideas are kind of esoteric, so we wanted to build something user-friendly. It required implementation, GPUs, fine-tuning, and sequence library design. It was a lot to learn, especially for biologists at the time.”

In contrast, OpenProtein’s platform has an intuitive web interface for biologists to upload data and perform protein engineering work using machine learning. It features a variety of open source models, including PoET, OpenProtein’s flagship protein language model.

PoET (short for Protein Eevolutionary Transformer) was trained on a group of proteins to generate a set of related proteins. Bepler and his collaborators showed that the evolutionary constraints on proteins can be generalized to incorporate new information about protein sequences without relearning, and that other researchers can improve the model by adding experimental data.

“Researchers can use their own data to train models and optimize protein sequences, and then use our other tools to analyze those proteins,” says Bepler. “People are generating libraries of protein sequences in computers. [on computers] Then run them through the predictive model to obtain validation and structure predictors. It’s basically a no-code front end, but we also have an API for those who want to access it in code. ”

This model helps researchers design proteins more quickly and determine which proteins are promising enough for further laboratory testing. Researchers can also input proteins of interest, and the model can also generate new proteins with similar properties.

Since its founding, the OpenProtein team has continued to add tools to the platform for researchers, regardless of laboratory size or resources.

“We’ve worked hard to make our platform an open-ended toolbox,” Bepler says. “It has a specific workflow, but it’s not specifically tied to one protein function or class of proteins. One of the great things about these models is that they’re really good at understanding proteins broadly. They learn about the entire space of possible proteins.”

Realizing next-generation treatments

Pharmaceutical giant Boehringer Ingelheim began using Open Protein’s platform in early 2025. The companies recently announced an expanded collaboration that will incorporate open protein platforms and models into Boehringer Ingelheim’s research to develop proteins to treat diseases such as cancer, autoimmune diseases, and inflammatory conditions.

Last year, OpenProtein also released a new version of its protein language model, PoET-2. It performs better than much larger models while using a small fraction of the computing resources and experimental data.

“We really want to solve the problem of how to describe proteins,” Bepler says. “What is a meaningful domain-specific language to use when generating protein constraints? How can we introduce more evolutionary constraints? How can we describe the enzymatic reaction that a protein performs so that the model can generate the sequence to perform that reaction?”

In the future, the founders hope to create a model that takes into account changes in protein function and their interconnectedness.

“An area that I’m excited about is using these models to predict and design dynamic features beyond protein binding events, where a protein must engage two, three, or four biological mechanisms at the same time, or change its function after binding,” says Lu, who currently serves in an advisory role at the company.

As advances in AI accelerate, OpenProtein’s mission continues to be to provide scientists with the best tools to develop new treatments faster.

“As studies become more complex, with approaches that incorporate protein logic and dynamic therapies, existing experimental toolsets become increasingly limited,” Lu says. “Building an open ecosystem around AI and biology is critical. There is a risk that AI resources will become too concentrated and unavailable to the average researcher. Open access is critical for scientific fields to advance.”



Source link