Introducing CancerGPT: A proposed model that uses large-scale language models to predict the synergistic effects of drug pairs on specific tissues in a few-shot setting

Machine Learning


Source: https://arxiv.org/pdf/2304.10946.pdf

The latest iterations of artificial intelligence use underlying models. Such a foundational or “generalist” model can be used for many downstream tasks without special training, instead of building an AI model that handles each specific task one at a time. For example, the large pre-trained language models GPT-3 and GPT-4 have revolutionized basic AI models. LLMs may use few-shot or zero-shot learning to apply that knowledge to new tasks that have not yet been taught. Part of this is due to multitask learning, which allows LLM to incorrectly learn from implicit tasks in the training corpus.

LLM shows proficiency in few-shot learning in several domains, such as computer vision, robotics, and natural language processing, but its generalizability to unobservable problems in more complex domains like biology is limited. Not yet fully explored. Inferring unobserved biological responses requires understanding the actors and their underlying biological systems. Most of this information is in free-text literature, potentially used for training LLMs, but encapsulates only a small amount in a structured database. Researchers at the University of Texas, the University of Massachusetts Amherst, and the University of Texas Health Sciences Center found that LLM, which extracts prior knowledge from the unstructured literature, can be used to improve biological predictions where structured data are lacking. We believe it can be a creative way to tackle a challenge. and small sample size.

A key issue in such few-shot biological predictions is the prediction of synergistic effects of drug pairs in under-explored cancer types. Combinations of drugs in therapy are now a common method for managing difficult-to-treat conditions such as cancer, infections, and neurological disorders. Combination therapy often yields better outcomes than monotherapy. Research on drug discovery and development is increasingly focused on predicting the synergistic effects of drug pairs. Synergy of drug pairs indicates that using two drugs together has a greater therapeutic effect than using each individually. Due to the large number of potential combinations and complexities of underlying biological systems, it is not easy to predict the synergistic effects of drug pairs.

🚀 Check out 100 AI Tools in the AI ​​Tools Club

Several computational techniques have been created to predict the synergistic effects of drug pairs, particularly machine learning. A large dataset of in vitro experimental results on drug combinations can be used to train machine learning algorithms to spot trends and predict synergistic potential for new drug pairs. Relatively small amounts of experimental data are accessible for some tissues, such as bone and soft tissue. In contrast, most data concern common cancer forms in selected tissues, such as breast and lung cancer. The amount of training data available for predicting synergy of drug pairs is limited by the physically demanding and expensive nature of obtaining cell lines from these tissues. Machine learning models that rely on large datasets may need help training.

Early studies ignored biological and cellular variations in these tissues and extrapolated synergy scores to cell lines in other tissues based on relevant or contextual information. Another line of research has attempted to reduce the disparities between tissues by making use of various high-dimensional data such as genomic or chemical profiles. Despite promising results in some tissues, these techniques cannot be used in tissues with sufficient data to modify models with many parameters of these high-dimensional properties. need to do it. They hope that in this work he will address the aforementioned problems facing LLMs. They argue that the scientific literature still contains useful information about cancer types for which systematic data are sparse and inconsistently characterized.

Manually gathering predictive data for such biological concerns from the literature is not easy. Harnessing historical information from the scientific literature stored in LLM is their novel strategy. They turned the prediction job into a natural language inference problem and created a model that generated a response based on the knowledge embodied in the LLM. This is called a synergistic prediction model for few-shot drug pairs. Their experimental results show that the LLM-based few-shot forecasting model outperforms the strong tabular forecasting model in most scenarios and achieves significant accuracy even in the zero-shot setting. This exceptional few-shot prediction performance, one of the most difficult biological prediction tasks, is of critical and timely importance to the large biomedical community as it demonstrates the high potential of ‘generalist’ biomedical artificial intelligence. have a high degree of relevance.


check out paperdon’t forget to join Our 20k+ ML SubReddit, cacophony channeland email newsletterWe share the latest AI research news, cool AI projects, and more. If you have any questions about the article above or missed something, feel free to email me. Asif@marktechpost.com

🚀 Check out 100 AI Tools in the AI ​​Tools Club

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing a Bachelor’s Degree in Data Science and Artificial Intelligence from the Indian Institute of Technology (IIT), Bhilai. He spends most of his time on projects aimed at harnessing the power of machine learning. His research interest is image processing and his passion is building solutions around it. He loves connecting with people and collaborating on interesting projects.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *