MedGemma技術報告

摘要

人工智慧（AI）在醫療應用中具有巨大潛力，但其訓練與部署面臨著醫療數據多樣性、任務複雜性以及隱私保護需求的挑戰。能夠在醫療任務上表現出色且需要較少任務特定調校數據的基礎模型，對於加速醫療AI應用的發展至關重要。我們介紹了MedGemma，這是一系列基於Gemma 3 4B和27B的醫療視覺-語言基礎模型。MedGemma在圖像和文本的醫療理解與推理上展現出先進能力，顯著超越同規模生成模型的表現，並接近任務特定模型的水平，同時保留了Gemma 3基礎模型的通用能力。在分佈外任務上，MedGemma在醫療多模態問答上實現了2.6-10%的提升，在胸部X光片發現分類上提升了15.5-18.1%，在代理評估上提升了10.8%，相較於基礎模型。對MedGemma進行微調進一步提升了在子領域的表現，將電子健康記錄信息檢索的錯誤率降低了50%，並在氣胸分類和組織病理學切片分類上達到了與現有專門最先進方法相當的性能。我們還介紹了MedSigLIP，這是一個基於SigLIP進行醫療調校的視覺編碼器。MedSigLIP增強了MedGemma的視覺理解能力，作為編碼器，其表現與專門的醫療圖像編碼器相當或更優。總的來說，MedGemma系列提供了強大的醫療圖像和文本處理基礎，具有顯著加速醫學研究及下游應用開發的潛力。MedGemma系列，包括教程和模型權重，可訪問https://goo.gle/medgemma獲取。

English

Artificial intelligence (AI) has significant potential in healthcare applications, but its training and deployment faces challenges due to healthcare's diverse data, complex tasks, and the need to preserve privacy. Foundation models that perform well on medical tasks and require less task-specific tuning data are critical to accelerate the development of healthcare AI applications. We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B. MedGemma demonstrates advanced medical understanding and reasoning on images and text, significantly exceeding the performance of similar-sized generative models and approaching the performance of task-specific models, while maintaining the general capabilities of the Gemma 3 base models. For out-of-distribution tasks, MedGemma achieves 2.6-10% improvement on medical multimodal question answering, 15.5-18.1% improvement on chest X-ray finding classification, and 10.8% improvement on agentic evaluations compared to the base models. Fine-tuning MedGemma further improves performance in subdomains, reducing errors in electronic health record information retrieval by 50% and reaching comparable performance to existing specialized state-of-the-art methods for pneumothorax classification and histopathology patch classification. We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP. MedSigLIP powers the visual understanding capabilities of MedGemma and as an encoder achieves comparable or better performance than specialized medical image encoders. Taken together, the MedGemma collection provides a strong foundation of medical image and text capabilities, with potential to significantly accelerate medical research and development of downstream applications. The MedGemma collection, including tutorials and model weights, can be found at https://goo.gle/medgemma.

MedGemma技術報告

MedGemma Technical Report

摘要

Support