IIT Bombay's DrugProtai predicts protein drug properties with unprecedented accuracy

Machine Learning


In drug discovery, the journey from identifying promising compounds to approved drugs is full of challenges. A staggering 90% of potential drugs are not reachable to patients. This high failure rate is often attributed to misidentifying the appropriate protein targets in the human body, where drugs can effectively bind and act. These traditional methods for finding “drugable” proteins combine experimental approaches such as nuclear magnetic resonance with computational methods that analyze protein characteristics. These methods are often slow, inaccurate and limited, resulting in significant financial and time investments. The recent rise of machine learning and artificial intelligence tools has driven this process exponentially, leading to the discovery of new drugs at a rapid time.

Another boost in this process, a team of researchers from Indian Institute of Technology (IIT) Bombay have developed an innovative new tool, DrugProtai. Their new computational framework is designed to predict whether proteins can be effectively targeted by drugs, providing a powerful new allies in the fight against disease.

Drugprotai's novelty lies in its ability to analyze proteins using vast amounts of information than previous tools. Besides looking at the basic building blocks of proteins, amino acids, DrugProtai also considers 183 different properties. We investigate the physical and chemical properties of proteins, their sequence (the order of amino acids), their interactions with other proteins, their location within the cell, and how they can be modified once they are made. A comprehensive approach to drawing data from major biological databases, such as prediction of 3D protein structures from Uniprot, Drugbank, PubMed, and AlphaFold, creates a much richer picture of each protein.

Did you know that only about 10% of potential drug candidates have successfully achieved a drug approved through clinical trials? The human body contains over 20,000 different proteins, but only a small portion of these are considered “drugable” targets for pharmaceuticals.

One of the biggest challenges in developing such tools is dealing with unbalanced data. In the human body, more proteins are less “drugable” than proteins. If AI models are trained with such skewed data, it can cause bias and struggle to accurately identify rare yet important narcotic-enabled proteins. DrugProtai directly addresses this challenge with a strategy known as the “partitioning-based ensemble method.” The vast pool of non-drugable proteins is divided into smaller, more manageable groups. Multiple AI models are then trained, each of which learns from one of these small groups, in addition to the complete set of loosely possible proteins. This will allow the model to get a balanced view and prevent it from overlooking the required patterns. The researchers found that two popular machine learning algorithms, Random Forest and XGBoost, work very well within this framework, achieving a median accuracy of 87% in predicting drug targets.

To ensure their tools were truly robust, the team conducted rigorous blind verification tests on DrugProtai. They used it to predict the drug properties of proteins that were only recently approved as drug targets. These were proteins that AI had never encountered during training. The results showed that DrugProtai correctly identifies 61 of 81 newly approved drug targets, demonstrating its real-world applicability and superior performance compared to existing tools such as Spider and DrugTar.

Drug Protai not only makes predictions, but also helps researchers understand why proteins are considered drugs. Using a technique called Shap (Shapley Additive Description), this tool identifies the key features that contribute most to the narcotic properties of proteins. For example, the presence of kinases (a type of protein that is often involved in cell signaling), specific secondary structures, and the “instability index” of proteins, have been found to be strong indicators. This interpretation is essential because it provides biological insights and allows researchers to make informed decisions rather than relying solely on black box predictions. The team also looked into deep learning methods where prediction scores can be slightly higher, but these methods often fail to explain why they make specific predictions, and stated that DrugProtai's interpretable approach is particularly valuable.

DrugProtai can streamline the identification of promising drug targets and significantly reduce the time and resources needed to develop new drugs. The tool will also be freely available online, making the technology accessible to researchers around the world. By providing a clear, fair and accurate method for assessing the drug properties of proteins, DrugProtai is poised to accelerate the development of life-saving drugs, bringing us closer to a future where a wider range of diseases can be used to access more effective treatments.


This article was written with the help of Generator AI and edited by the editors of Research Matters.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *