MedGemma技术报告

摘要

人工智能（AI）在医疗应用领域展现出巨大潜力，但其训练与部署面临诸多挑战，主要源于医疗数据的多样性、任务的复杂性以及对隐私保护的需求。基础模型在医疗任务中表现优异且所需任务特定调优数据较少，这对加速医疗AI应用的发展至关重要。我们推出了MedGemma，这是一系列基于Gemma 3 4B和27B的医疗视觉-语言基础模型。MedGemma在图像与文本的医疗理解与推理方面展现出卓越能力，显著超越同规模生成模型的表现，并接近任务特定模型的性能，同时保留了Gemma 3基础模型的通用能力。在分布外任务上，MedGemma在医疗多模态问答上实现了2.6%-10%的提升，在胸部X光发现分类上提升了15.5%-18.1%，在代理评估上提升了10.8%，相较于基础模型。进一步微调MedGemma可提升其在子领域的性能，电子健康记录信息检索的错误率降低了50%，并在气胸分类和组织病理学切片分类上达到了与现有最先进专门方法相当的水平。此外，我们还推出了MedSigLIP，这是一个基于SigLIP进行医疗调优的视觉编码器。MedSigLIP为MedGemma的视觉理解能力提供支持，作为编码器，其性能与专门的医疗图像编码器相当甚至更优。综上所述，MedGemma系列为医疗图像与文本处理提供了强大的基础能力，有望显著加速医学研究及下游应用的开发。MedGemma系列，包括教程与模型权重，可访问https://goo.gle/medgemma获取。

English

Artificial intelligence (AI) has significant potential in healthcare applications, but its training and deployment faces challenges due to healthcare's diverse data, complex tasks, and the need to preserve privacy. Foundation models that perform well on medical tasks and require less task-specific tuning data are critical to accelerate the development of healthcare AI applications. We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B. MedGemma demonstrates advanced medical understanding and reasoning on images and text, significantly exceeding the performance of similar-sized generative models and approaching the performance of task-specific models, while maintaining the general capabilities of the Gemma 3 base models. For out-of-distribution tasks, MedGemma achieves 2.6-10% improvement on medical multimodal question answering, 15.5-18.1% improvement on chest X-ray finding classification, and 10.8% improvement on agentic evaluations compared to the base models. Fine-tuning MedGemma further improves performance in subdomains, reducing errors in electronic health record information retrieval by 50% and reaching comparable performance to existing specialized state-of-the-art methods for pneumothorax classification and histopathology patch classification. We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP. MedSigLIP powers the visual understanding capabilities of MedGemma and as an encoder achieves comparable or better performance than specialized medical image encoders. Taken together, the MedGemma collection provides a strong foundation of medical image and text capabilities, with potential to significantly accelerate medical research and development of downstream applications. The MedGemma collection, including tutorials and model weights, can be found at https://goo.gle/medgemma.

MedGemma技术报告

MedGemma Technical Report

摘要

Support