Google Research and Deepmind have released Medgemma, an open source collection of AI models specifically built for medical use.
The Medgemma family includes a 4B model that can handle text, images, or both, and a larger 27B version in a text-only multimodal format. Google featured the collection at this year's I/O Conference.
Medgemma is designed for a wide range of medical fields, including radiology, dermatology, histopathology, and ophthalmology. According to Google, the model serves as the foundation for new healthcare AI tools and work, either independently or within an agent-based system.

Medgemma benefits more than the standard model
Technical reports show that Medgemma offers significant improvements compared to the basic model of sizes as well. In professional medical tasks, the model increases by up to 10% higher accuracy in multimodal Q&A, results of 15.5-18.1% for X-ray classification, and 10.8% for complex agent-based assessments.
advertisement
This is what the benchmark score does. In MEDQA testing health check questions, the 4B model reaches 64.4% accuracy compared to 50.7% at baseline. The 27B version scores at 87.7%, up from 74.9%.

Medgemma is also outperforming the basic model on medical benchmarks. In testing the MIMIC-CXR dataset of X-ray images and reports, the 4B version posted a macro F1 score of 88.9 compared to the 81.2 in the original Gemma 3 4B model. F1 scores track accuracy across a variety of medical conditions.
Medsiglip: Special Image Encoder
For image processing, Google has introduced Medsiglip, a 400 million parameter medical image encoder. Medsiglip is based on Siglip (“sigmoid loss in language images”), a system designed to link images to text. The medical version expands this to enable Medgemma to interpret medical images more effectively.

MedSiglip interprets medical images as the Medgemma 27b, making it a powerful multimodal system of healthcare. The encoder runs at a resolution of 448 x 448 pixels. This is more efficient than the high resolution 896 x 896 variant used in Medgemma.
This model was trained with over 33 million image text pairs, including 635,000 examples and 32.6 million histopathology patches from different medical domains. To maintain general image recognition for Siglip, the original dataset was kept, medical data constituted 2% of the total, and the encoder had both general and medical content processed.
recommendation

Fine tuning of real-world medical tasks
Researchers showed how Medgemma is fine-tuned for specific medical tasks. For automated X-ray report generation, the Radgraph F1 score has improved from 29.5 to 30.3, indicating better capture of essential clinical information. For Pneumothorax (disintegrated lung) detection, the accuracy jumped from 59.7 to 71.5. In histopathology, the weighted F1 score for tissue classification increased from 32.8 to 94.5.
Video: Google
Electronic health record analysis resulted in major leap. Reinforced learning reduced the error rates in data search by half and promised new efficiency in processing patient data.
Medgemma is available for use with a hug in the face. The license allows research, development, and general AI to use it, but no direct medical diagnosis or treatment is permitted without regulatory approval. Commercial use is permitted as long as restrictions are observed.
Benchmarks are not the same as the real world
Last year, Google launched a medical AI model built on a closed Gemini platform. Medgemma's open source foundation and customization support could encourage wider adoption.
Still, strong benchmark results may not always lead to clinical practice. One study found that real-world validity is limited by misunderstandings and incorrect user interactions, which can highlight the gap between test scores and actual results.
