AI enhances medical decision-making, Stanford study finds

Chatbots powered by artificial intelligence are quite good at diagnosing some diseases, even if they are complex. But how can chatbots help guide treatment and care after diagnosis? For example, how long before surgery should a patient stop taking prescribed blood thinners? Should a patient’s treatment protocol be changed if they have had adverse reactions to similar medications in the past? There are no textbook right or wrong answers to these types of questions. It depends on the doctor’s judgment.

Jonathan H. Chen, MD, PhD, and a team of researchers are investigating whether chatbots (LLMs), a type of large-scale language model, can effectively answer such sensitive questions and whether the performance of physicians supported by chatbots will improve.

As it turns out, the answer is yes and yes. The research team tested how the chatbot performed when faced with various clinical crossroads. While the chatbot itself outperformed doctors who only had access to internet searches and medical references, the doctors who were equipped with their own LLMs came together from multiple regions and facilities across the United States to catch up with the chatbot.

“I’ve been saying for years that humans and computers together will do better than either one alone,” Chen said. “I think this study challenges us to think more critically about this and ask ourselves, ‘What are computers good at? What are humans good at?’” We may need to rethink where we use and combine these skills and which tasks we employ AI for. ”

A study detailing these results was published in the journal Nature Medicine on February 5, 2025. Chen and Adam Rodman, MD, assistant professor at Harvard University, are co-senior authors. Postdoctoral researchers Ethan Goh, MD, and Robert Gallo, MD, are co-lead authors.

facilitated by chatbot

In October 2024, Chen and Goh led a research team that was published in JAMA Network Open. The study tested how chatbots perform in diagnosing diseases and found that even when using chatbots, their accuracy was higher than that of doctors. The paper delves into the trickier aspects of medicine, evaluating the performance of chatbots and doctors on questions that fall into a category called “clinical management reasoning.”

Goh explains the difference: Imagine you are using a maps app on your phone to guide you to a specific destination. Using LLM to diagnose a disease is similar to using a map to determine the correct location. How to get there is a matter of reasoning on the part of management. Do you take back roads because there is traffic? Stay on course and go bumper to bumper? Or do you just wait and hope the road is clear?

In a medical setting, these decisions can be difficult. Suppose a doctor happens to discover that a hospitalized patient has a sizable mass in the upper part of his lung. What’s the next step? Doctors (or chatbots) need to be aware that large nodules in the upper lobes of the lung are statistically more likely to spread throughout the body. Your doctor can biopsy the mass immediately, schedule a procedure for a later date, or order imaging tests to learn more.

Deciding which approach is best for a patient requires considering many details, starting with the patient’s known preferences. Are they reluctant to undergo invasive procedures? Does the patient’s medical history indicate a lack of follow-up for appointments? Can the hospital’s health system be trusted to schedule follow-up appointments? What about referrals? Chen said it is important to consider such situational factors.

The team designed a trial to investigate the performance of clinical management reasoning in three groups: a group using only a chatbot, 46 physicians receiving chatbot support, and 46 physicians accessing only Internet searches and medical reference materials. They selected five anonymized patient cases and passed them on to the chatbot and doctors. All physicians provided written responses detailing what they would do in each case, why they did it, and what they considered in making their decisions.

Additionally, the researchers tapped a group of certified physicians to create a rubric to certify that medical judgments and decisions were appropriately evaluated. Decisions were then scored against a rubric.

To the team’s surprise, the chatbot outperformed doctors who only had access to the internet and medical reference materials, and ticked more rubric items than doctors. However, doctors combined with a chatbot performed as well as the chatbot alone.

Order of operations

Chen and colleagues followed up on that discovery. Simply telling clinicians to use AI is not enough. Successful collaboration depends on how the tools are integrated into the medical workflow. In other words, who should go first? Should the doctor evaluate the case and then consult the AI for a second opinion, or should the AI make the first decision?

In a paper published in Nature Digital Medicine on March 18, 2026, the research team detailed the results of a randomized controlled trial of 70 doctors who collaborated with an AI agent to evaluate a series of medical cases. The researchers found that when the AI evaluated a case after a doctor had done it, the AI tended to side with the doctor, even if it was instructed to reason independently. The winning approach turned out to be parallel analysis. The doctor and the AI evaluated the case simultaneously, and the AI tool then created a summary that analyzed the two takes, their similarities and where they deviated.

“The basic idea was to study how the behavior of AI systems could be redesigned to support deeper collaboration between physicians and AI, moving the use of AI from interacting with tools to collaborating with clinical teammates,” said Celyn Everett, a medical student at Stanford University and lead author of the study.

What is the future of chatbot doctors?

Exactly what spurred collaboration between doctors and chatbots is up for debate. Does LLM require doctors to be more cautious about cases, or does LLM provide guidance that doctors might not have thought of on their own? This is a direction for future exploration, Chen said.

The positive results of chatbots and doctors combining them with chatbots lead to the oft-asked question: “Are there going to be AI doctors?”

“Maybe that’s an advantage for AI,” Chen said. However, the results suggest that doctors may want to welcome chatbot assistance rather than replacing doctors. “This doesn’t mean patients should ignore their doctors and go straight to chatbots. Don’t do that,” he said. “There’s a lot of good information out there, but there’s also bad information. A skill we all need to develop is discerning what is trustworthy and what is false. That’s more important now than ever.”

Researchers from the VA Palo Alto Health Care System, Beth Israel Deaconess Medical Center, Harvard University, University of Minnesota, University of Virginia, Microsoft, and Kaiser contributed to the study.

This research was funded by the Gordon and Betty Moore Foundation, the Stanford Clinical Excellence Research Center, and a VA Medical Informatics Advanced Fellowship.

Stanford University School of Medicine also supported this research.

/Open to the public. This material from the original organization/author may be of a contemporary nature and has been edited for clarity, style, and length. Mirage.News does not take any institutional position or position, and all views, positions, and conclusions expressed herein are those of the authors alone. Read the full text here.

Source link