Generative artificial intelligence (Gen AI) can quickly and accurately screen patients to determine their eligibility for clinical trials, according to a new study from researchers at Massachusetts General Hospital and Brigham Hospital. Such technology could make it faster and cheaper to evaluate new treatments and ultimately help bring effective therapies to patients.
The researchers evaluated the accuracy and cost of a Gen AI process they named RAG-Enabled Clinical Trials Infrastructure Inclusion Exclusion Review (RECTIFIER), which identifies patients who meet criteria for enrollment in heart failure trials based on their medical records. For criteria that require reviewing patient records, they found that RECTIFIER screened patients more accurately than disease-trained research coordinators who typically perform the screening, and at a fraction of the current cost. Findings: NEJM AI.
“We see that large-scale language models have the potential to fundamentally improve clinical trial screening. Now we are beginning the difficult work of determining how to integrate this capability into real-world trial workflows to simultaneously improve efficacy, safety, and fairness.”
Samuel (Sandy) Aronson, ALM, MA, co-senior study author and executive director of IT and AI Solutions, Massachusetts General Brigham Precision Medicine
Samuel (Sandy) Aronson also serves as Senior Director of IT and AI Solutions at the Accelerator for Clinical Transformation.
Clinical trials enroll people who meet certain criteria, such as age, diagnosis, major health indicators, and current or past medications. These criteria help ensure that researchers enroll participants who are representative of the people expected to benefit from the treatment. Enrollment criteria also help ensure that trials do not enroll patients who have unrelated health problems or who are taking medications that may interfere with the results.
“Participant screening is one of the most time-consuming, labor-intensive and error-prone activities in a clinical trial,” said co-first author Ozan Unlüh, MD, a clinical informatics fellow at the Brigham Massachusetts General Hospital and a cardiovascular medicine fellow at Brigham and Women's Hospital.
The research team, part of the Massachusetts General Hospital Brigham Clinical Transformation Accelerator, tested the ability of an AI process to identify patients eligible for the Collaborative Program for Optimal Treatment Implementation in Heart Failure (COPILOT-HF) trial. The trial recruits patients with heart failure symptoms and identifies potential participants based on electronic health record (EHR) data. The researchers designed 13 prompts to assess clinical trial eligibility. They tested and refined these prompts using a small number of patients' charts and then applied them to a dataset of 1,894 patients with an average of 120 notes per patient. They then compared the screening performance of the process to that of study staff.
The accuracy of the AI process, based on its alignment with a “gold standard” assessment by expert clinicians of whether patients met the study criteria, ranged from 97.9% to 100%. In comparison, research staff evaluating the same medical records were slightly less accurate than the AI, with accuracy rates ranging from 91.7% to 100%.
The researchers estimate that it would cost their AI model to test each patient at about $0.11, which the authors explain is an order of magnitude cheaper than traditional testing methods.
Co-author Alexander Blood, MD, a cardiologist and associate director of the Clinical Transformation Accelerator at Brigham and Women's Hospital, noted that using AI in clinical trials can reduce the time it takes to determine whether a treatment works. “If we can speed up the clinical trial process and conduct cheaper, more equitable trials without sacrificing safety, we can get drugs to patients sooner and make them available to a wider range of people,” Blood said.
The researchers noted that there are potential risks to watch for when integrating AI into everyday workflows: AI can introduce bias and miss nuances in medical records. Additionally, changes to how data is captured in the health system can have a significant impact on AI performance.
For these reasons, the authors conclude that any study that uses AI to screen patients must have some form of check: Most trials have clinicians who double-check participants who research staff deem eligible for the study, and the researchers recommend continuing this final check with AI screening.
“Our goal is to prove that this works in other disease areas and use cases, as we expand beyond the walls of Massachusetts General Hospital Brigham,” Blood added.
sauce:
Journal References:
Unruh, O. others(2024) Search-Enhanced Generative-Enabled GPT-4 for Clinical Trial Screening. Screw it. doi.org/10.1056/AIoa2400181.
