
The preclinical phase of drug discovery is the longest stage in the R&D lifecycle, taking up to six years and accounting for over 40% of total drug development costs. To reduce the billions of dollars spent on preclinical drug development, faster and more efficient R&D workflows must be prioritized across the industry. So it's no wonder that pharmaceutical and biotech companies are turning to machine learning (ML) to revolutionize R&D and AI to generate and validate their small molecule drug discovery pipelines.
Research institutions that have successfully adopted AI are already gaining a competitive advantage. Evidence is emerging that these organizations can get through the preclinical stage faster and cheaper than traditional approaches, saving about 30% in time and cost. The approach is already gaining traction: one study by the Boston Consulting Group found that biotech companies that have adopted an AI-first approach “have more than 150 small molecule drugs in exploration, with more than 15 already in clinical trials.”
Predictive AI is one AI approach currently being explored by many pharmaceutical and biotech companies. Here are five steps research leaders can follow to achieve success.
Before investing in predictive AI, research leaders need to define the problem, or use case, they want to address. Typically, predictive AI is best used for discrete tasks and processes where measurable, tangible benefits can be achieved. Examples of predictive AI use cases in early drug discovery include predicting 3D structures of proteins, relationships between molecules based on chemical structure, and drug-target interactions.
In small molecule discovery, predictive retrosynthesis combines high-quality reaction data with AI to find structural or chemical patterns that correlate with the properties of specific compounds, accelerating the planning of the synthesis of novel molecular entities. The potential advantages of predictive retrosynthesis over traditional approaches are significant, generating pathways for novel compounds in minutes instead of weeks.
The nuances of research questions in drug discovery demand a level of accuracy that requires high-quality, validated training data. Without accurate, high-quality data, researchers cannot have confidence in the results of predictive AI. To make predictive models work, researchers need to include data from multiple sources in addition to their in-house data. This typically includes data from the scientific literature as well as other databases that contain patent data, regulatory data, clinical trial data, safety data, and patient record data.
For example, predictive AI chemistry models require a wide range of chemistry inputs, including published literature as well as proprietary data and data from failed reactions. Predictive models that are fine-tuned using incomplete data can produce inferior results where shortcomings are not immediately identified, leading to costly incorrect decisions.
Once data is captured, it needs to be structured to effectively leverage predictive AI. Much of the data R&D organizations collect is not AI-ready. Datasets are siloed and stored in various formats with poor metadata, making them difficult to retrieve and use in predictive AI models. Applying ontologies to standardize and structure datasets is a critical step.
An ontology is a human-generated, machine-readable description of a category. It standardizes data based on an agreed-upon vocabulary, providing a common language across an organization. The vocabulary can include industry-recognized concepts and terms, as well as organization-specific terms, such as product names. Ontologies define semantic relationships with other classes and capture synonyms. This is essential when there are multiple ways to describe the same entity in scientific literature or other datasets. For example, genes PSEN1 Also called 1 peso or Presenilin-1.
To extract insights, datasets need to be enriched and annotated. Semantic enrichment is a key step in unlocking the full potential of your data, structured and unstructured, public and proprietary datasets. Annotating, tagging and adding metadata transforms text into clean, contextualized data free of ambiguity and synonyms. Use text analytics to extract keywords, concepts and terms for predictive models and harmonize synonyms to improve accuracy.
Data harmonization is especially important when using databases from multiple sources, as technical terms and abbreviations are common. For example, sophisticated semantic enrichment software can identify and extract related terms and patterns in text, harmonizing synonyms such as “heart attack” and “myocardial infarction” so that they are identified as the same entity by predictive models. This eliminates “noise” and ensures that predictive AI models are powered by high-quality, enriched data.
Structuring data for predictive AI through ontologies and applying semantic enrichment techniques is a highly specialized task that requires expert understanding of the domain being studied. While general-purpose AI models developed by tech companies are useful in a wide range of fields such as marketing and operations, scientific research comes with a set of niche challenges that require domain expertise.
Currently, few biopharmaceutical companies have the right mix of skills required for a task such as creating an ontology in-house. And while their researchers may be scientific domain experts, they lack the necessary technical capabilities. Data scientists are best positioned to solve this challenge because they combine technology skills with scientific domain expertise. They can understand the context of the question being asked in relation to the data available. Furthermore, they can build ontologies and vocabularies to ensure that predictive AI models return appropriate results and that important data is not overlooked.
The world agrees that AI will revolutionize every industry. For those involved in preclinical drug discovery, the opportunities are enormous, but so are the challenges. To accelerate the discovery of medicines to meet the medical needs of patients around the world, pharmaceutical and biotech companies must bring together data, technology, and expertise. When these elements come together, AI can become a valuable support tool for researchers and usher in a new era of drug discovery.
