醫學SAM3:通用提示驅動醫學影像分割的基礎模型
Medical SAM3: A Foundation Model for Universal Prompt-Driven Medical Image Segmentation
January 15, 2026
作者: Chongcong Jiang, Tianxingjian Ding, Chuhan Song, Jiachen Tu, Ziyang Yan, Yihua Shao, Zhenyi Wang, Yuzhang Shang, Tianyu Han, Yu Tian
cs.AI
摘要
諸如SAM3等可提示分割基礎模型已透過互動式與概念式提示展現出強大的泛化能力。然而,其在醫學影像分割中的直接應用仍受限於嚴重的領域偏移、缺乏特權空間提示,以及需要對複雜解剖結構和體積數據進行推理的挑戰。本文提出Medical SAM3——一種通用提示驅動的醫學影像分割基礎模型,通過在配備分割標注與文字提示的大規模異構二維及三維醫學影像數據集上對SAM3進行全參數微調而得。透過對原始SAM3的系統性分析,我們發現其對醫學數據的性能顯著衰減,其表面競爭力主要依賴於強幾何先驗(如從真實標注衍生的邊界框)。這些發現促使我們超越單純的提示工程,進行完整的模型適應。通過在涵蓋10種醫學影像模態的33個數據集上微調SAM3的模型參數,Medical SAM3在保持提示驅動靈活性的同時,獲得了穩健的領域特定表徵能力。跨器官、影像模態和維度的廣泛實驗表明,該模型實現了持續且顯著的性能提升,尤其在具有語義模糊性、複雜形態學和長程三維上下文特徵的挑戰性場景中。我們的成果確立了Medical SAM3作為醫學影像領域通用文字引導分割基礎模型的地位,並凸顯了全模型適應對於在嚴重領域偏移下實現穩健提示驅動分割的重要性。程式碼與模型將於https://github.com/AIM-Research-Lab/Medical-SAM3 開源。
English
Promptable segmentation foundation models such as SAM3 have demonstrated strong generalization capabilities through interactive and concept-based prompting. However, their direct applicability to medical image segmentation remains limited by severe domain shifts, the absence of privileged spatial prompts, and the need to reason over complex anatomical and volumetric structures. Here we present Medical SAM3, a foundation model for universal prompt-driven medical image segmentation, obtained by fully fine-tuning SAM3 on large-scale, heterogeneous 2D and 3D medical imaging datasets with paired segmentation masks and text prompts. Through a systematic analysis of vanilla SAM3, we observe that its performance degrades substantially on medical data, with its apparent competitiveness largely relying on strong geometric priors such as ground-truth-derived bounding boxes. These findings motivate full model adaptation beyond prompt engineering alone. By fine-tuning SAM3's model parameters on 33 datasets spanning 10 medical imaging modalities, Medical SAM3 acquires robust domain-specific representations while preserving prompt-driven flexibility. Extensive experiments across organs, imaging modalities, and dimensionalities demonstrate consistent and significant performance gains, particularly in challenging scenarios characterized by semantic ambiguity, complex morphology, and long-range 3D context. Our results establish Medical SAM3 as a universal, text-guided segmentation foundation model for medical imaging and highlight the importance of holistic model adaptation for achieving robust prompt-driven segmentation under severe domain shift. Code and model will be made available at https://github.com/AIM-Research-Lab/Medical-SAM3.