Large-scale language models (LLMs) such as ChatGPT are rapidly being introduced into medicine, but strong clinical evidence for their use in practice remains limited. A new study published in the journal Gastroenterology & Endoscopy provides the first overview of randomized controlled trials (RCTs) specifically evaluating LLM in gastrointestinal diseases.
An international research team systematically reviewed published and ongoing RCTs conducted since 2022 and identified only 14 eligible trials worldwide (4 published and 10 ongoing). Most studies were conducted in China and the United States and mainly focused on gastrointestinal and hepatobiliary diseases. The most common uses of LLM included clinical decision making and patient education, with the primary task being question answering.
“While there is growing enthusiasm for the use of LLM in gastrointestinal diseases, we found that high-quality clinical evidence is still lacking,” said the study's lead author, Dr. Peng Wu, of the Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences. “Randomized controlled trials are essential to determine whether these tools truly improve patient outcomes and quality of care.”
Of note, although many studies claimed clinical relevance, only some used real patient data, and most trials were single-center and exploratory in nature. The authors also found that both general-purpose models (e.g., ChatGPT) and domain-specific medical language models were tested, reflecting different strategies for integrating AI into clinical workflows.
Co-corresponding author Dr. Zhirong Yang emphasized the importance of careful implementation. “Large language models should not replace clinicians; instead, they should be evaluated as support tools that extend clinical capabilities while maintaining human oversight,” he said.
The review also highlights several gaps in current research, including the lack of international multicenter trials, inconsistent reporting standards, and limited assessment of ethical risks such as psychedelic output and data privacy. The authors urge future trials to adopt standardized reporting guidelines and focus on real patient outcomes.
Overall, this study is a timely demonstration of how AI language models are beginning to transition from laboratory tools to potential clinical assistants in gastrointestinal medicine, and highlights the urgent need for strong evidence before widespread adoption.
/Open to the public. This material from the original organization/author may be of a contemporary nature and has been edited for clarity, style, and length. Mirage.News does not take any institutional position or position, and all views, positions, and conclusions expressed herein are those of the authors alone. Read the full text here.
