MedGemma技術報告書

要旨

人工知能（AI）は医療分野において大きな可能性を秘めているが、その訓練と展開は、医療データの多様性、複雑なタスク、プライバシー保護の必要性といった課題に直面している。医療タスクにおいて優れた性能を発揮し、タスク固有のチューニングデータを少なく要求する基盤モデルは、医療AIアプリケーションの開発を加速するために重要である。本論文では、Gemma 3 4Bおよび27Bを基盤とした医療視覚言語基盤モデルのコレクションであるMedGemmaを紹介する。MedGemmaは、画像とテキストに対する高度な医療理解と推論能力を示し、同規模の生成モデルを大幅に上回る性能を発揮し、タスク固有のモデルに近い性能を維持しながら、Gemma 3基盤モデルの汎用能力を保持している。分布外タスクにおいて、MedGemmaは、医療マルチモーダル質問応答で2.6～10％、胸部X線所見分類で15.5～18.1％、エージェント評価で10.8％の改善を達成した。MedGemmaのファインチューニングにより、サブドメインにおける性能がさらに向上し、電子健康記録情報検索のエラーを50％削減し、気胸分類および組織病理学パッチ分類において既存の専門的な最先端手法と同等の性能に到達した。さらに、SigLIPから派生した医療用視覚エンコーダであるMedSigLIPを紹介する。MedSigLIPは、MedGemmaの視覚理解能力を支え、エンコーダとして専門的な医療画像エンコーダと同等またはそれ以上の性能を達成する。全体として、MedGemmaコレクションは、医療画像とテキストの強力な基盤を提供し、医学研究および下流アプリケーションの開発を大幅に加速する可能性を秘めている。MedGemmaコレクションは、チュートリアルおよびモデルウェイトとともに、https://goo.gle/medgemma で公開されている。

English

Artificial intelligence (AI) has significant potential in healthcare applications, but its training and deployment faces challenges due to healthcare's diverse data, complex tasks, and the need to preserve privacy. Foundation models that perform well on medical tasks and require less task-specific tuning data are critical to accelerate the development of healthcare AI applications. We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B. MedGemma demonstrates advanced medical understanding and reasoning on images and text, significantly exceeding the performance of similar-sized generative models and approaching the performance of task-specific models, while maintaining the general capabilities of the Gemma 3 base models. For out-of-distribution tasks, MedGemma achieves 2.6-10% improvement on medical multimodal question answering, 15.5-18.1% improvement on chest X-ray finding classification, and 10.8% improvement on agentic evaluations compared to the base models. Fine-tuning MedGemma further improves performance in subdomains, reducing errors in electronic health record information retrieval by 50% and reaching comparable performance to existing specialized state-of-the-art methods for pneumothorax classification and histopathology patch classification. We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP. MedSigLIP powers the visual understanding capabilities of MedGemma and as an encoder achieves comparable or better performance than specialized medical image encoders. Taken together, the MedGemma collection provides a strong foundation of medical image and text capabilities, with potential to significantly accelerate medical research and development of downstream applications. The MedGemma collection, including tutorials and model weights, can be found at https://goo.gle/medgemma.

MedGemma技術報告書

MedGemma Technical Report

要旨

Support