MedSAM-Agent:基于多轮智能体强化学习的交互式医学图像分割赋能框架
MedSAM-Agent: Empowering Interactive Medical Image Segmentation with Multi-turn Agentic Reinforcement Learning
February 3, 2026
作者: Shengyuan Liu, Liuxin Bao, Qi Yang, Wanting Geng, Boyun Zheng, Chenxin Li, Wenting Chen, Houwen Peng, Yixuan Yuan
cs.AI
摘要
醫學影像分割正從任務特定模型向通用化框架演進。最新研究利用多模態大語言模型(MLLMs)作為自主智能體,採用可驗證獎勵的強化學習(RLVR)來協調如Segment Anything Model(SAM)等專業工具。然而,這些方法通常依賴單輪次、剛性的互動策略,且缺乏訓練過程中的流程級監督,限制了其充分發揮互動工具動態潛力的能力,導致操作冗餘。為解決這一侷限,我們提出MedSAM-Agent框架,將互動式分割重構為多步驟自主決策過程。首先,我們引入混合提示策略生成專家校準的軌跡,使模型能夠內化類人決策啟發式與自適應優化策略。進一步,我們開發了兩階段訓練流程,整合多輪端到端結果驗證與臨床擬真流程獎勵設計,以提升互動簡約性與決策效率。在6種醫學影像模態和21個數據集上的廣泛實驗表明,MedSAM-Agent實現了最優性能,有效統一了自主醫學推理與魯棒的迭代優化。代碼已開源於https://github.com/CUHK-AIM-Group/MedSAM-Agent。
English
Medical image segmentation is evolving from task-specific models toward generalizable frameworks. Recent research leverages Multi-modal Large Language Models (MLLMs) as autonomous agents, employing reinforcement learning with verifiable reward (RLVR) to orchestrate specialized tools like the Segment Anything Model (SAM). However, these approaches often rely on single-turn, rigid interaction strategies and lack process-level supervision during training, which hinders their ability to fully exploit the dynamic potential of interactive tools and leads to redundant actions. To bridge this gap, we propose MedSAM-Agent, a framework that reformulates interactive segmentation as a multi-step autonomous decision-making process. First, we introduce a hybrid prompting strategy for expert-curated trajectory generation, enabling the model to internalize human-like decision heuristics and adaptive refinement strategies. Furthermore, we develop a two-stage training pipeline that integrates multi-turn, end-to-end outcome verification with a clinical-fidelity process reward design to promote interaction parsimony and decision efficiency. Extensive experiments across 6 medical modalities and 21 datasets demonstrate that MedSAM-Agent achieves state-of-the-art performance, effectively unifying autonomous medical reasoning with robust, iterative optimization. Code is available https://github.com/CUHK-AIM-Group/MedSAM-Agent{here}.