According to a preprint submitted to arXiv, the number of registered AI-related clinical trials is rapidly increasing, with China and the United States leading the global number.
This study Trends in AI and Human-AI Interaction in Clinical Trials – Exploring Human-AI Hybridsprepared for the Hybrid Human-Artificial Intelligence Conference HHAI2026 Workshop on Health, Well-Being, and Human-AI Interaction. It analyzes AI-related records from ClinicalTrials.gov, the world’s largest public clinical trial registry, and tests whether a hybrid workflow using frontier-generated AI models and human review can help classify how AI is used in clinical trials.
The authors searched ClinicalTrials.gov using a broad AI-focused search string covering AI, machine learning, deep learning, computer vision, natural language processing, neural networks, expert systems, chatbots, ChatGPT, GPT, and large-scale language models.
AI terminology is expanding across clinical trials
This study reveals significant changes in the language used to describe AI in clinical trials. While older AI-related terms, such as expert systems, appear in older records, recent exams increasingly use terms related to modern data-driven AI. While references to artificial intelligence, AI, machine learning, and deep learning have proliferated over the past decade, terms related to chatbots, GPT, and large-scale language models have grown particularly rapidly since the release and general adoption of new conversational AI systems.
The term AI returned the most unique records, followed by artificial intelligence, machine learning, and deep learning. Chatbots, GPTs, neural networks, and expert systems also captured a significant number of unique records, demonstrating that no single term captures the full scope of AI-related clinical research.
This is critical because the clinical AI field is fragmented both by technology and language. Research may describe systems as machine learning, computer vision, chatbots, algorithms, expert systems, or decision support tools. In some cases, AI may be embedded within broader digital interventions, making it difficult to determine whether AI is central to the trial or merely a background component.
This paper highlights important issues regarding reporting. Trial registry records often do not clearly define the AI methodology, system role, data input, users involved, or level of human-AI interaction. This lack of detail creates a barrier for researchers trying to compare AI trials, assess safety and fairness, and understand how AI is being tested in real-world clinical settings.
The problem goes beyond AI. Similar gaps emerge in research into digital health and wearable technology, where trial records often lack basic details such as which version was tested, what the technology actually does, how well it works, and where it will be used. With AI, these gaps can carry greater risks because patient outcomes depend not only on model accuracy but also on how the model fits into real-world clinical workflows.
The authors note that existing reporting guidance, such as SPIRIT-AI for AI trial protocols and CONSORT-AI for completed randomized trial reports, was developed to improve transparency. However, compliance remains incomplete in published studies, and AI trial descriptions often still lack the information needed to understand how the system interacts with patients, clinicians, and other users.
China and US lead growth in AI testing
The geographic pattern of AI-related clinical trials shows clear concentration. China and the United States account for the largest number of AI-classified exams in the dataset, with each country recording roughly four times as many exams as the next highest country. Italy came in third place, followed by France, Spain, the United Kingdom, Turkey, Taiwan, Germany, South Korea, India, Canada, the Netherlands, Singapore and Japan.
According to the study, the number of AI clinical trials in China has rapidly increased since 2018, overtaking that in the United States. Both countries currently dominate registered AI-related clinical research, reflecting extensive investments in AI, digital health, hospital technology, and biomedical research infrastructure.
The global reach of AI testing continues to grow. Several countries in Europe and Asia have shown notable increases in recent years, suggesting that AI clinical research is moving beyond its early hubs and into a broader international phase. That said, the paper notes that many records do not declare a location, limiting the accuracy of geographic analysis.
The dataset included both interventional and observational studies. Of the 5,828 records returned, 3,019 were intervention studies, 2,807 were observational studies, and 2 were expanded access records. Rather than restricting their analysis to a single trial design, the researchers continued their observational study because they aimed to understand the use of AI in clinical research documentation.
The paper’s geographic findings add to previous evidence showing that registered AI and machine learning clinical research has increased significantly since 2010. However, the authors extend their work by focusing not only on the use of AI but also on human-AI interactions, an underdeveloped area in healthcare research.
Human-AI interaction is important because AI systems in healthcare rarely work alone. Diagnostic tools may guide clinicians, chatbots communicate with patients, monitoring systems alert medical teams, and decision support platforms develop treatment plans. In both cases, clinical outcomes will depend in part on how people receive, interpret, and act on the output of AI.
This study categorized the possible interaction categories as: no use of AI, use of AI without human-AI interaction, patient-AI interaction, caregiver-AI interaction, healthcare professional-AI interaction, other human-AI interactions, and hybrid AI interactions involving multiple types of users.
Hybrid reviews show promise, but trial reporting remains a barrier
The study also validated a hybrid human-AI workflow for screening and classifying clinical trial records. Researchers used GPT-5.5 through the OpenAI API to classify trial records and provide concise descriptions and reliability ratings. A human reviewer then classified a representative random sample of 100 records, and a third reviewer resolved any disagreements.
This result suggests that AI-assisted screening can help identify records that do not make substantial use of AI. In a sample of 100 records, both human and AI classifiers identified 14 trials as not using AI, but 2 trials did not match. These two cases involve uncertainty, and the AI system also reported low confidence in them.
Classifying human-AI interactions has been a more difficult task. Among records where AI use was acknowledged, there was often disagreement over whether an interaction occurred and which human group interacted with the system. The most common disagreement was over whether trials should be classified as medical professional-AI interaction or no human-AI interaction. This uncertainty reflects the limitations of court records. Although some descriptions refer to AI systems, it is not clear whether the clinician receives the AI output directly, the AI runs in the background, the patient interacts with the tool, or the AI only processes the data outside of the clinical workflow. In complex interventions, systems can influence decision-making if you do not clearly document who will be involved.
Human classifiers were sometimes rejected when there wasn’t enough information in the record, especially when trying to classify human-AI interactions. This presents a core challenge for both human and machine reviews. In other words, the quality of the classification depends on the quality of the report.
The authors also note that AI-assisted reviews are free of cost. Their experimental workflow included extensive API usage and cost $75.17 for key calculations. We also used a paid ChatGPT Pro account for our pilot work. The paper also notes the environmental costs of AI calculations, estimating energy usage equivalent to tens of kilowatt-hours and at least some of the associated carbon emissions.
The findings further indicate several policy and research needs. Clinical trial records should more clearly specify whether AI will be used, the type of AI involved, whether a user will interact with the AI, who that user will be, what expertise will be required, and how the output of the AI will be integrated into the intervention or workflow. Clearer interaction categories would also help reviewers compare trials and assess clinical risk.
