MedSAM3:深入探索具醫學概念的通用分割模型
MedSAM3: Delving into Segment Anything with Medical Concepts
November 24, 2025
作者: Anglin Liu, Rundong Xue, Xu R. Cao, Yifan Shen, Yi Lu, Xiang Li, Qianqian Chen, Jintai Chen
cs.AI
摘要
醫學影像分割是生物醫學發現的基礎。現有方法缺乏泛化能力,且在新的臨床應用中需要耗時費力的大量人工標註。本文提出MedSAM-3,一種可透過文字提示進行醫學影像與影片分割的模型。透過在配對語義概念標籤的醫學影像上微調Segment Anything Model(SAM)3架構,我們的MedSAM-3實現了醫學可提示概念分割(PCS),能夠透過開放詞彙的文字描述(而非僅依賴幾何提示)精準定位解剖結構。我們進一步推出MedSAM-3代理框架,整合多模態大型語言模型(MLLM),在代理參與迴路的工作流程中執行複雜推理與迭代優化。跨X光、磁共振成像、超音波、電腦斷層及影片等多種醫學影像模態的綜合實驗表明,本方法顯著優於現有專業模型與基礎模型。我們將於https://github.com/Joey-S-Liu/MedSAM3公開程式碼與模型。
English
Medical image segmentation is fundamental for biomedical discovery. Existing methods lack generalizability and demand extensive, time-consuming manual annotation for new clinical application. Here, we propose MedSAM-3, a text promptable medical segmentation model for medical image and video segmentation. By fine-tuning the Segment Anything Model (SAM) 3 architecture on medical images paired with semantic conceptual labels, our MedSAM-3 enables medical Promptable Concept Segmentation (PCS), allowing precise targeting of anatomical structures via open-vocabulary text descriptions rather than solely geometric prompts. We further introduce the MedSAM-3 Agent, a framework that integrates Multimodal Large Language Models (MLLMs) to perform complex reasoning and iterative refinement in an agent-in-the-loop workflow. Comprehensive experiments across diverse medical imaging modalities, including X-ray, MRI, Ultrasound, CT, and video, demonstrate that our approach significantly outperforms existing specialist and foundation models. We will release our code and model at https://github.com/Joey-S-Liu/MedSAM3.