Monica Nandagopal, Senior Analyst, Beroe Inc.

Although clinical trials play an important role in fostering medical innovation, they frequently encounter challenges in enrolling diverse and representative patient populations. The combination of RWD and Advanced ML models provides a powerful and transformative solution for optimizing patient recruitment. This article examines key factors defining meaningful patient cohorts, reviews key sources of RWD, and highlights machine learning techniques that improve patient identification and recruitment efficiency. Industry case studies demonstrate the success of implementing RWD and AI-powered tools that accelerate trial timelines and increase inclusiveness. The analysis highlights the importance of integrating multiple data sources with customized ML algorithms to overcome recruitment challenges, reduce costs, and produce clinically generalizable results. This integrated approach is positioned to become the future standard for clinical research and to enable faster and more equitable delivery of effective treatments to a wider patient population.1
Need for an appropriate patient cohort
A well-selected cohort will help improve trial validity, reduce timeline delays, and meet regulatory expectations for real-world applicability. Data from an article published in 2019 showed that automated eligibility pregnancy reduced patient screening times by 30% to 35%, increased the number of candidates who matched the relevant trial criteria by 15%, and increased the number of candidates who approached and agreed to by 10% compared to manual screening methods.2
Related patient populations are required to ensure generalizable outcomes and to generate statistically meaningful results that confirm the safety and efficacy of treatment in a trial population that reflects the real-world diversity of age, gender, ethnicity and clinical characteristics. Meaningful cohorts include underrepresented groups (elderly, minorities, and groups with comorbidities) and are being studied to address equitably and improve treatment relevance. A well-defined cohort can also reduce screen failure rates and accelerate timely registration, research completion and market access.
Selecting patient cohort:
The main factors that influence patient selection are clinical, geographical, patient perception, and demographic parameters. Evaluating each parameter and giving an appropriate weight will help researchers determine the appropriateness of the patient for the trial. Pharmaceutical companies can apply these weight loss parameters to identify the most important factors in selecting the right patients for a particular treatment area. For example, during RWD selection for rare diseases, it is important that higher weights be assigned to clinical eligibility and access parameters, as these factors have a significant impact on clinical trial success.3

Source: Beroe Analysis
The above are the directive weights that can be employed while evaluating patients for clinical trials, which can be modified based on disease severity and access to patients combined with multiple sources of RWD, allowing a better understanding of patient status and clinical morbidity.
RWD Source
Traditional recruitment is often slow, expensive and inefficient, so trials struggle to enroll a well-qualified and diverse patient. RWD is useful by providing a richer and broader perspective on patient populations that better reflect real-world demographics and clinical reality. Combining these sources together gives you a comprehensive understanding of disease prevalence, patient characteristics, and physician treatment patterns.3 The RWD source is:



Highlights of RWD adoption in clinical research
The following analysis describes the RWD landscape under various parameters such as region and treatment area.

Source: Global Data
The chart above shows that oncology is the primary therapeutic area using RWD and RWE. This includes 34% of studies that incorporate these elements. Trials in the field of the central nervous system represent 12% of studies using RWD/RWE, followed by cardiovascular indications, accounting for 10%..4
China leads the geographical distribution of RWE trials, accounting for 30% of the global total. After that, Italy was 10%, Germany was 10%, the US was 9%, and Japan was 8%.4
High concentrations of RWD/RWE testing in a given region are affected by factors such as the regulatory environment, data availability and quality, health system maturity, and regional adoption of innovative testing methods. Combining multiple RWD sources such as EHRs, claims, registry, pharmacy, wearables, and patient-generated data improves patient identification accuracy and recruitment efficiency. This multi-source integration is especially valuable in oncology and complex treatment fields where patient heterogeneity and dynamic disease conditions are common..5-8
![]()
Potential solutions
Combination of RWD sources for recruitment efficiency
A useful combination of data sources is:
- EHR and Claims Data: It provides comprehensive patient snapshots, including both detailed clinical profiles and care patterns. Suitable for large-scale studies that require detailed patient journeys.
- EHR and patient-reported data: Patient recruiters can consider both objective health care standards and subjective patient experiences to broaden their eligibility range. Suitable for recruitment based on quality of life, symptoms of illness.
- Hospital data and data collected from wearables: Allows timely identification of patients developing or progressing to the desired clinical condition. Suitable for dynamic conditions, digital health research.
- EHR and Lab Data: Reduce pre-screening failures and quickly rule out patients with contraindications or insufficient lab values prior to outreach. Suitable for drug testing with specific biomarker requirements.7,8
Machine learning models for pre-screening of patients
Identifying patterns using the appropriate ML model can help you determine which patients will benefit from a particular trial or are most likely to meet criteria. The integration of EHR takes this a step further by allowing real-time access to comprehensive patient data, including medical history, diagnosis, and treatment.9-11
Different model algorithms are used for a variety of purposes. Below are some examples.

Source: Intermediate Articles and Beroe Analysis
Real-world examples of RWD and ML
Pharmaceutical companies are beginning to utilize RWD and ML models to bring the highest patient quality in clinical trials. Some real-world examples include:

Source: Press Release
Recommendations for adopting RWD and ML
Pharma Companies actively employs RWD combined with advanced ML models to optimize patient recruitment in clinical trials. For those interested in adopting this approach:
- A comprehensive and accurate understanding of patient profiles and disease status using a combination of complementary RWD sources such as EHRS, claims data, patient report results, hospital data, wearables, and laboratory data.
- Apply appropriate ML algorithms including predictive modeling, natural language processing, ensemble methods, and deep learning to screen, identify and prioritize patients who meet exam eligibility criteria, and also predict patient retention and diversity optimization.
- We incorporate weighted assessments of key factors (clinical eligibility, demographic diversity, geographic access, patient perception, data quality) to help cohorts represent real world populations and address health disparities.
- Create high-quality recruitment materials by integrating regulatory and compliance guidelines and leveraging AI-powered tools that minimize the risk of regulatory retreats.
- It promotes strategic partnerships between pharmaceutical companies and technology/data platform providers, accelerates the implementation of AI-driven recruitment solutions, and maximizes patient inclusion and testing efficiency.
These integrations allow clinical trials to achieve faster, more cost-effective recruitment of relevant and diverse patient cohorts, ultimately reducing research timelines and improving the generalizability of trial results. This approach supports the broader goal of bringing safe and effective treatments to patients faster and more equitable.15,16
References:
- M. Abdalah Ismail, P. TalhaAl-Zoubi, P. IssamEl Naqa, M. HinaSaeed, “The role of artificial intelligence to speed up recruitment in clinical trials” Oxford Academy, p. 5, 2023.
- I. Spasic, D. Krzeminski, P. Corcoran, A. Balinsky, “Cohort Selection for Clinical Trials from Longitudinal Patient Records: A Text Mining Approach.” National Biotechnology Information Center, 2019.
- Biopharma Dive, “Biopharma Dive”, April 17, 2023. [Online]. Available: https://www.biopharmadive.com/spons/patient-centered-clinical-trials-emprove-recruitment-and-retention/647481/
- Global Data, “Clinical Trial Arena,” November 2024. [Online]. Available: https://www.clinicaltrialsarena.com/analyst-comment/real-world-evidence-trials-increase-2024/?
- M. Grabner, C. Molife, L. Wang, K. Winfree, Z. Cui, G. CuyunCarter, L. Hess, “Data integration to improve research into real-world health outcomes of non-small cell lung cancer in the United States: descriptive and qualitative exploration.” JMIR cancer, Vol. 7, no. 2, 2021
- T. Beukelman, L. Chen, N. Annapureddy, J. Oates, Meb Clowse, M. Long, MD Kappelman, RL Rhee, Pa Merkel, WB Nowell, F. Xie, C. Clinton, Jr Curtis, National Biotechnology Information Center, p. 32, 2023.
- Ms Janssen, Om Deckers, S. L. Cessy, L. Hoft, H. Garders dottil, A. d. Boer and Rhh Groenwold, “Real-World Evidence Notifies Regulatory Decision: A Scoping Review,” American Association for Clinical Pharmacology and Treatment, 2024.
- “Iqvia”, July 4th, 2004. [Online]. Available: https://www.iqvia.com/library/publications/unlock-the-keys-to-effective-real-world-data-usage.
- X. Lu, C. Yang, L. Liang, G. Hu, Z. Zhong, Z. Jiang, “Artificial Intelligence for Recruitment and Retention in Clinical Trials: A Scoping Review,” National Biotechnology Information Center, p. 32, 2025.
- M. Samuel Caskovic, M. Kirk D. Wyatt, P. Thomas Oliwa, M. Luca Graglia, M. Brian Farner, P. Juholy, P. American Society of Clinical Oncology Journal, Vol. 7, 2023.
- A. Iyer and S. Narayanaswami, “New Model Using ML Technology for Clinical Trial Design and Rapid Patient Onboarding Processes.” National Biotechnology Information Center, 2025.
- K. Kantor and M. Morzy, “Eligibility Criteria for Machine Learning and Natural Language Processing in Clinical Trials: A Scoping Review,” National Biotechnology Information Center, 2024
- D. Kitashian, “Pfizer's AI Strategy: An Analysis of Domination in Pharma”, klover.ai, 2025.
- G. McDonald, “Biox Conomy,” November 18, 2024. [Online]. Available: https://www.bioxconomy.com/clinical-and-research/sanofi-to-use-ai-to-accelerate– recruitment-in-iii-ms-program
- Drug Patent Watch, “Drug Patent Watch”, July 25, 2025. [Online]. Available: https://www.drugpatentwatch.com/blog/8-applications-machine-learning-pharmaceutical-industry/.
- K. Getz, “New insights into the impact of AI-enabled solutions,” Applied clinical trials, Vol. 34, no. 3, 2025.
About the author:
Monica Nandagopal is a Category Research Analyst with over 6 years of experience in market research and consulting. Her insights support top pharma companies' strategic decisions regarding supplier outsourcing, category management and planning. Over the past year, she has worked on more than 10 market procurement studies, visualization of five supplier data, and multiple rapid and responsive analyses across clients on global and regional requirements.
