Every knowledge-based profession may one day be one where AI outperforms human experts. Medically, it looked like that day would come in April. A group of researchers, primarily from Harvard and Stanford Universities, published the results of a study that pitted ChatGPT against hundreds of doctors in a diagnostic obstacle course that included written medical riddles and information from real patients. The bots won, but the humans weren’t entirely happy with it.
“I’m a little nervous about how some of these results will be used,” Adam Rodman, the study’s lead author, said at a press conference shortly before publication in the journal. science. The study amounted to an academic exercise, he told reporters. No matter how thorough, ChatGPT and other AI tools have not proven that they are ready to become a standard part of medical practice. His warnings were consistent with those of other experts, but as Rodman knew, most people would ignore them. AI has already infiltrated the US healthcare system, and no evidence or safeguards will help.
Even while I was watching Rodman’s press conference, I received a message on my cell phone from the administrator of the medical center where I work as a pathologist. They emailed me to say they now have an “AI-powered clinical reasoning tool.” This wasn’t the first time I’d received this type of email. It wasn’t the second or third time. In fact, I can’t count how many generative AI products have been deployed to us in recent years, none of which have been approved for medical use by the FDA.
I feel this enthusiasm is unprecedented. Healthcare is typically one of the slowest sectors to adopt new technology. I still use a pager and send faxes regularly. (Younger readers may want to ask Claude to explain what these are.) The trend toward simple technology is partly a product of physicians’ safety-oriented culture. We know that a poorly timed glitch can be fatal. But these days, clinicians are allowed, encouraged, even allowed, to master the latest software, guided by the common caveat that AI can make mistakes.
Those mistakes can have consequences. Rodman’s research shows that generative AI can help diagnose rare diseases and understand unusual symptoms; NEJM AI Just a week ago, we discovered that intentionally erroneous output from an AI model can easily mislead doctors. Non-experts can be similarly misled. A recent study by Oxford scientists found that the use of AI does not significantly improve patients’ ability to diagnose themselves or others. Another paper led by researchers at Mount Sinai suggests that chatbots may not be able to alert users to potential medical emergencies.
Misdiagnosis is not the only concern. As AI permeates the healthcare system, errors are occurring in unexpected places. When I spoke to Rodman by phone after the press conference, he said he was surprised to learn one day that his hospital, Beth Israel Deaconess Medical Center, had asked AI to draft messages to patients on his behalf. At times, he would produce review output that Rodman described as “completely ridiculous.” (Sarah Finloh, a spokeswoman for Beth Israel Lahey Health, told me that the use of AI tools is voluntary and subject to hospital training and support. She also said that all output from AI tools must be approved by a physician.)
Part of the problem is that health-related AI products can be introduced without review by FDA authorities. If a software package for physicians is classified as a “clinical decision support tool” rather than a medical device, it can usually avoid regulatory oversight. For an AI-powered app to be counted in this category, it typically must rely on existing medical literature, avoid analyzing medical scans or images, explain its rationale, and leave diagnosis and treatment to doctors. Most of the generative AI products currently used by physicians appear to meet these criteria.
Consumer wellness apps and devices can also avoid FDA review as long as they are intended to “maintain or promote a healthy lifestyle” rather than to diagnose or treat a specific condition. With this in mind, Microsoft, OpenAI, Anthropic, and xAI all warn users that health-related chatbots are not intended to provide medical care or issue diagnoses or treatment recommendations. However, in reality, the distinction is not always clear-cut. Elon Musk is encouraging people to use his Grok chatbot to generate second opinions and interpretations of X-ray and MRI images. ChatGPT Health’s marketing video shows the app reassures people that their test results are within a healthy range and encourages them to continue taking their cholesterol medication.
Most of these apps prompt users to connect their medical records with wearable health devices. AI companies don’t have to swallow all this data just to provide general health information. A new product from medical startup Hims & Hers called Labs AI helps users interpret the results of “up to 130 biomarker tests,” providing “detailed, personalized, and actionable analysis of whole-body health, risks, and patterns.” I also analyze patient test results and provide personalized, practical advice. What’s the difference?
When we contacted the manufacturers of these products, they reaffirmed that no actual medical advice is provided to users. Dominic King, vice president of health at Microsoft AI, said in an emailed statement that the company’s Copilot app provides “information and support to help conversations with clinicians” and does not provide “a single definitive diagnosis.” Patrick Carroll, Hims & Hers’ chief medical officer, told me that Labs AI does not make diagnoses or recommend treatments. “That responsibility lies with the clinician, and Labs is designed to enforce that boundary.” Anthropic and xAI did not respond to my inquiries. OpenAI declined to comment for this article.
Perhaps the line between doctors and algorithms may be somewhat artificial to begin with. One idea circulating in the medical literature is to stop treating AI products as if they were just standard medical devices. Given its human-like ability to learn new information and tailor answers to individual patients, medical AI may function more like a doctor than a defibrillator, and perhaps should be valued in the same way. Instead of seeking FDA approval for every function a chatbot can perform, it might be required to pass a medical licensing exam and undergo a period of supervision similar to a medical residency.
But for now, the idea remains vague. Haider Warraich, a cardiologist and program manager at the Health Advanced Research Projects Agency, the U.S. government’s advanced medical technology development program, is leading a large-scale effort to get medical chatbots approved through traditional methods. His agency will fund the development of an AI tool customized for heart disease and send it through a full FDA approval process. Warraich’s hope is that with such rigorous evaluations, chatbots will be able to safely assess and treat patients without doctor intervention. Rodman praised the approach, but warned that the process would take years, during which time a slew of new health AIs would be brought to market with little scrutiny.
In this way, today’s emergence of AI health products is reminiscent of the rise of ride-sharing services like Uber and Lyft in the 2010s. The taxi industry is highly regulated, making it difficult for new entrants to enter the market. But by circumventing, and sometimes ignoring, these rules, ride-hailing companies have been able to quickly gain significant numbers of users. Soon, the government had little choice but to adjust the law to fit the existing status quo. The same pattern could occur in medicine. Will regulations to ensure the safety and effectiveness of medical products remain in place, or will they instead be weakened or removed to make way for tools that everyone already uses?
You’ll understand right away. The health system isn’t going to “relax and wait for the evidence,” Rodman told me. According to a 2026 survey by the American Medical Association, 80% of physicians are already using AI tools in their practice. Patients are not far behind. While the benefits of AI may still be uncertain, it is already too attractive to ignore.
