A large-scale new multimodal AI system trained on tens of millions of medical images could unify fragmented radiology tools and help doctors interpret scans and generate reports more efficiently.

Research: MedVersa: A generalist-based model for diverse medical image processing tasks. Image credit: Thitisan / Shutterstock
In a recent study published in the journal NEJM AIresearchers introduced generalist artificial intelligence “MedVersa” (A.I.) A model that can capture and interpret a wide range of medical imaging modalities and task types. Unlike conventional A.I. MedVersa is a model trained for specific, narrow tasks and built on tens of millions of medical image instances to detect pathology and generate reports within a unified analysis framework.
Encouragingly, when comparing MedVersa’s performance to that of human radiologists in blinded evaluations of chest radiograph reports, the model often produced reports that were judged to be clinically equivalent to human-written reports, particularly for scans containing normal findings, and significantly reduced the time human radiologists spent documenting findings. Taken together, these results suggest that MedVersa is a promising step toward developing a new generation of integrated multimodal foundation models that may help integrate the currently fragmented ecosystem of systems. A.I. Tools currently used in clinical practice.
Background: Fragmentation of medical artificial intelligence tools
Recent advances in computing power and artificial intelligence (A.I.) By model logic, some of these tools have been approved for use in the medical field, but their use is often fragmented. Although a model trained on an X-ray dataset can accurately detect pneumonia from a patient’s chest X-ray, MRI or ultrasound data for overall assessment of the patient.
These “expert” models often struggle to adapt to complex clinical workflows where multiple data types are involved in diagnosing a patient. Computational biologists are developing generalist medical artificial intelligence (GMAI).
Their goal was to create a “foundation model” (similar to the “agent” technology employed by ChatGPT, Google Gemini, and other large language models). [LLMs]) can handle multimodal input/output. Unfortunately, previous attempts to realize this concept have focused primarily on text-based input and have proven unable to resolve the complex visual tasks essential to radiology.
MedVersa Multimodal AI Model Development
This study aimed to address this functional gap by engineering “MedVersa,” a radiology-specific generalist. A.I. A model that can capture, annotate, diagnose, report, and document multimodal clinical image data. The model was trained using MedInterp, a large dataset that aggregates 91 public datasets. This dataset includes images, bounding box annotations, segmentation masks, captions, and other visual-linguistic surveillance signals used in a variety of image processing tasks, and includes more than 29 million medical instances.
This model was trained LLM As an “orchestrator,” it evaluates the user’s requirements (e.g., “Where is the patient’s tumor?”) and dynamically selects the appropriate internal vision module within the MedVersa framework for request execution. unlike before GMAIPrimarily text-based, MedVersa was designed to generate text responses or deploy specialized “vision modules” for object detection or segmentation.
As a result, MedVersa can process diverse inputs such as 2D X-ray, 3D, and more. CT and MRI Create scans and patient history text at the same time. Following model training, MedVersa’s performance was validated against two different traditional competitors across nine different imaging tasks. 1. Certified experts A.I. Model, 2. Board-certified radiologists (n = 10).
Evaluation framework and comparative testing
Performance evaluation requires experts (experts) A.I. Review reports generated by MedVersa for humans, ChatGPT-4o, and chest X-ray examinations using a model or a human radiologist. Importantly, the experts were blinded to the data source. Performance was scored based on the clinical accuracy and assessment efficiency (time taken to complete the assessment and generate the report) of the expert’s output.
Findings: Performance across imaging tasks
Research results show that MedVersa GMAI This architecture competes with, and often outperforms, traditional “gold standard” professional models across many object detection and segmentation metrics.
When evaluating model report generation, on the BLEU-4 test (higher numbers are better and measures text similarity), MedVersa achieved a score of 17.8 compared to MAIRA’s 14.2, BiomedGPT’s 12.0, and Med-PaLM M’s 11.5. In the RadCliQ test (lower is better and measures deviation from human clinical reports), MedVersa achieved a score of 2.71 compared to MAIRA’s 3.10 and BiomedGPT’s 3.25. Med-PaLM M reported a slightly better RadCliQ score (2.67), which was statistically indistinguishable from MedVersa.
Comparison with human radiologist reports
When compared to human experts, researchers found that MedVersa reports were clinically equivalent to human-written reports in 64% of cases. For scans with normal findings, this equivalence increased to 91%. However, for scans with abnormal findings with more complex pathology, equivalence was much lower, with human-written reports often preferred by peer-review radiologists.
Researchers also demonstrated that using MedVersa as an assistant, physicians can complete report writing workflows more quickly. This reduced report creation time and, importantly, resulted in fewer “urgent” discrepancies (errors requiring immediate attention) than the reports produced by GPT-4o (20% reduction in 5-10 minute reporting intervals).
Conclusion: Towards an integrated clinical AI assistant
This study reveals that MedVersa is an important step toward the development of integrated clinical assistants, rather than relying on traditional fragmented clinical assistants. A.I. tool. Its architecture is LLM By tuning dedicated vision tools, this new model can now achieve performance that competes with or exceeds dedicated tools. A.I. Significantly streamline and accelerate the workflow of human expert radiologists while creating models that span multiple tasks.
However, this study found that while MedVersa was superior for routine cases, board-certified radiologists were still preferred for complex and unusual cases with complex pathology, highlighting the importance of expert supervision. The authors also note that broader generalizability across imaging modalities remains an ongoing challenge, as some of the non-thoracic X-ray datasets included in the study were dominated by segmentation tasks rather than full diagnostic interpretation.
Therefore, although this study validates MedVersa as a strong proof of concept, future GMAI Models should be trained using expanded datasets that include more modalities (such as genetic information and electronic health records). [EHRs]) to maximize the potential of A.I.-Human expert-mediated patient care.
Reference magazines:
- Zhou, H.-Y., Acosta, J. N., Adithan, S., Datta, S., Topol, E. J., and Rajpurkar, P. (2026). MedVersa: A generalist-based model for diverse medical image processing tasks. NEJM AI. DOI – 10.1056/aioa2500595. https://ai.nejm.org/doi/full/10.1056/AIoa2500595
