AI outperforms docs when it comes to clinical reasoning, but it can’t work alone

A large-scale language model outperformed doctors on a variety of clinical reasoning tasks, according to a new study. However, the study’s authors cautioned that this result does not mean that AI tools are ready to autonomously practice medicine.

Since LLMs proliferated into healthcare settings in late 2022, the question of whether AI tools can accurately perform clinical reasoning tasks has become a top concern. In general, research shows that LLMs’ clinical reasoning abilities are improving, but models still struggle with certain tasks and should remain under human supervision.

However, few studies have compared the clinical reasoning abilities of advanced LLMs with the baseline performance of human physicians. So researchers at Harvard Medical School and Beth Israel Deaconess Medical Center set out to establish these baselines and evaluate LLM’s performance against them in a new study published in 2006. science.

Researchers evaluated the clinical reasoning capabilities of the OpenAI o1 series. They compared the performance of the AI model with hundreds of physicians across a variety of experiments, including public patient records, evaluation of new emergency room patients, and clinical tasks including diagnosis and clinical management planning.

Overall, the AI model outperformed physicians across experiments, including an experiment using real unstructured clinical data from an emergency department EHR. In the ER experiment, the model was presented to the patient at different points in the diagnostic process. They asked the model to inform it at each step from triage to admission decision and generate a likely diagnosis and treatment plan. Overall, o1 outperformed both ChatGPT-4o and two expert attending physicians when evaluated by two other attending physicians.

In another experiment, researchers used five clinical vignettes to test the AI model’s ability to provide next steps in clinical management. Using a mixed-effects model, we found that the o1-preview model scored 41 percentage points higher than GPT-4 alone, 41.9 percentage points higher than physicians using GPT-4, and 48.4 percentage points higher than physicians using traditional resources.

“Our findings suggest that LLM outperforms most clinical reasoning benchmarks,” the researchers concluded.

binance skapa konto commented on MTN SA upgrades call centres with AI, ML capabilities: Your article helped me a lot, is there any more re
create binance account commented on Telco leaders join forces to discuss next steps towards highly autonomous networks: Your point of view caught my eye and was very inte
最佳Binance推荐代码 commented on New Microsoft Teams App is Now Available: I don't think the title of your article matches th
"oppna ett binance-konto commented on Why the Apple UK hiring spree “makes sense” for the company: Your article helped me a lot, is there any more re
Реферальная программа binance commented on Amazon, Google Among Firms Focusing on AI Lobbying in States: I don't think the title of your article matches th

AI outperforms docs when it comes to clinical reasoning, but it can’t work alone

RECENT POSTS

Atlanta expects AI jobs to be cut as training programs expand

v4c.ai secures strategic support from Databricks Ventures to expand enterprise AI services platform

LAS launches new interdisciplinary initiative on trustworthy AI

Does clinical AI still require human participation?

Related Posts