MedGemma 기술 보고서

초록

인공지능(AI)은 의료 분야에서 상당한 잠재력을 가지고 있지만, 의료 데이터의 다양성, 복잡한 작업, 그리고 개인정보 보호의 필요성으로 인해 AI의 훈련과 배포에는 여러 도전 과제가 존재합니다. 의료 작업에서 우수한 성능을 보이며 작업별 튜닝 데이터가 적게 필요한 기초 모델은 의료 AI 애플리케이션의 개발을 가속화하는 데 중요합니다. 본 연구에서는 Gemma 3 4B와 27B를 기반으로 한 의료 비전-언어 기초 모델 컬렉션인 MedGemma를 소개합니다. MedGemma는 이미지와 텍스트에 대한 고급 의료 이해 및 추론 능력을 보여주며, 유사한 크기의 생성 모델을 크게 능가하고 작업별 모델의 성능에 근접하는 동시에 Gemma 3 기본 모델의 일반적인 기능을 유지합니다. 분포 외 작업에서 MedGemma는 의료 다중모드 질의응답에서 2.6-10%, 흉부 X-레이 발견 분류에서 15.5-18.1%, 그리고 에이전트 평가에서 10.8%의 성능 향상을 기본 모델 대비 달성했습니다. MedGemma를 미세 조정하면 하위 도메인에서의 성능이 더욱 개선되어 전자 건강 기록 정보 검색에서 오류를 50% 줄이고, 기흉 분류 및 조직병리학 패치 분류에서 기존의 특화된 최신 방법과 비슷한 성능에 도달했습니다. 또한, SigLIP에서 파생된 의료용 비전 인코더인 MedSigLIP를 소개합니다. MedSigLIP는 MedGemma의 시각적 이해 능력을 뒷받침하며, 인코더로서 특화된 의료 이미지 인코더와 비슷하거나 더 나은 성능을 달성합니다. 종합적으로, MedGemma 컬렉션은 의료 이미지와 텍스트 기능에 대한 강력한 기초를 제공하며, 의료 연구 및 하위 애플리케이션 개발을 크게 가속화할 잠재력을 가지고 있습니다. MedGemma 컬렉션은 튜토리얼 및 모델 가중치와 함께 https://goo.gle/medgemma에서 확인할 수 있습니다.

English

Artificial intelligence (AI) has significant potential in healthcare applications, but its training and deployment faces challenges due to healthcare's diverse data, complex tasks, and the need to preserve privacy. Foundation models that perform well on medical tasks and require less task-specific tuning data are critical to accelerate the development of healthcare AI applications. We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B. MedGemma demonstrates advanced medical understanding and reasoning on images and text, significantly exceeding the performance of similar-sized generative models and approaching the performance of task-specific models, while maintaining the general capabilities of the Gemma 3 base models. For out-of-distribution tasks, MedGemma achieves 2.6-10% improvement on medical multimodal question answering, 15.5-18.1% improvement on chest X-ray finding classification, and 10.8% improvement on agentic evaluations compared to the base models. Fine-tuning MedGemma further improves performance in subdomains, reducing errors in electronic health record information retrieval by 50% and reaching comparable performance to existing specialized state-of-the-art methods for pneumothorax classification and histopathology patch classification. We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP. MedSigLIP powers the visual understanding capabilities of MedGemma and as an encoder achieves comparable or better performance than specialized medical image encoders. Taken together, the MedGemma collection provides a strong foundation of medical image and text capabilities, with potential to significantly accelerate medical research and development of downstream applications. The MedGemma collection, including tutorials and model weights, can be found at https://goo.gle/medgemma.