MedGemma技術報告
MedGemma Technical Report
July 7, 2025
作者: Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, Justin Chen, Fereshteh Mahvar, Liron Yatziv, Tiffany Chen, Bram Sterling, Stefanie Anna Baby, Susanna Maria Baby, Jeremy Lai, Samuel Schmidgall, Lu Yang, Kejia Chen, Per Bjornsson, Shashir Reddy, Ryan Brush, Kenneth Philbrick, Howard Hu, Howard Yang, Richa Tiwari, Sunny Jansen, Preeti Singh, Yun Liu, Shekoofeh Azizi, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Riviere, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Elena Buchatskaya, Jean-Baptiste Alayrac, Dmitry, Lepikhin, Vlad Feinberg, Sebastian Borgeaud, Alek Andreev, Cassidy Hardin, Robert Dadashi, Léonard Hussenot, Armand Joulin, Olivier Bachem, Yossi Matias, Katherine Chou, Avinatan Hassidim, Kavi Goel, Clement Farabet, Joelle Barral, Tris Warkentin, Jonathon Shlens, David Fleet, Victor Cotruta, Omar Sanseviero, Gus Martins, Phoebe Kirk, Anand Rao, Shravya Shetty, David F. Steiner, Can Kirmizibayrak, Rory Pilgrim, Daniel Golden, Lin Yang
cs.AI
摘要
人工智慧(AI)在醫療應用中具有巨大潛力,但其訓練與部署面臨著醫療數據多樣性、任務複雜性以及隱私保護需求的挑戰。能夠在醫療任務上表現出色且需要較少任務特定調校數據的基礎模型,對於加速醫療AI應用的發展至關重要。我們介紹了MedGemma,這是一系列基於Gemma 3 4B和27B的醫療視覺-語言基礎模型。MedGemma在圖像和文本的醫療理解與推理上展現出先進能力,顯著超越同規模生成模型的表現,並接近任務特定模型的水平,同時保留了Gemma 3基礎模型的通用能力。在分佈外任務上,MedGemma在醫療多模態問答上實現了2.6-10%的提升,在胸部X光片發現分類上提升了15.5-18.1%,在代理評估上提升了10.8%,相較於基礎模型。對MedGemma進行微調進一步提升了在子領域的表現,將電子健康記錄信息檢索的錯誤率降低了50%,並在氣胸分類和組織病理學切片分類上達到了與現有專門最先進方法相當的性能。我們還介紹了MedSigLIP,這是一個基於SigLIP進行醫療調校的視覺編碼器。MedSigLIP增強了MedGemma的視覺理解能力,作為編碼器,其表現與專門的醫療圖像編碼器相當或更優。總的來說,MedGemma系列提供了強大的醫療圖像和文本處理基礎,具有顯著加速醫學研究及下游應用開發的潛力。MedGemma系列,包括教程和模型權重,可訪問https://goo.gle/medgemma獲取。
English
Artificial intelligence (AI) has significant potential in healthcare
applications, but its training and deployment faces challenges due to
healthcare's diverse data, complex tasks, and the need to preserve privacy.
Foundation models that perform well on medical tasks and require less
task-specific tuning data are critical to accelerate the development of
healthcare AI applications. We introduce MedGemma, a collection of medical
vision-language foundation models based on Gemma 3 4B and 27B. MedGemma
demonstrates advanced medical understanding and reasoning on images and text,
significantly exceeding the performance of similar-sized generative models and
approaching the performance of task-specific models, while maintaining the
general capabilities of the Gemma 3 base models. For out-of-distribution tasks,
MedGemma achieves 2.6-10% improvement on medical multimodal question answering,
15.5-18.1% improvement on chest X-ray finding classification, and 10.8%
improvement on agentic evaluations compared to the base models. Fine-tuning
MedGemma further improves performance in subdomains, reducing errors in
electronic health record information retrieval by 50% and reaching comparable
performance to existing specialized state-of-the-art methods for pneumothorax
classification and histopathology patch classification. We additionally
introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP.
MedSigLIP powers the visual understanding capabilities of MedGemma and as an
encoder achieves comparable or better performance than specialized medical
image encoders. Taken together, the MedGemma collection provides a strong
foundation of medical image and text capabilities, with potential to
significantly accelerate medical research and development of downstream
applications. The MedGemma collection, including tutorials and model weights,
can be found at https://goo.gle/medgemma.