MedOpenClaw：基於未經整理完整影像研究的可審計醫學影像智能體推理系統

摘要

當前，針對視覺語言模型在醫學影像任務中的評估存在過度簡化臨床現實的問題，其依賴於預先篩選的二維圖像，這些圖像需要耗費大量人力進行整理。這種設定忽略了真實世界診斷的核心挑戰：一個真正的臨床智能體必須能夠主動瀏覽完整的三維影像卷積（涵蓋多序列或多模態數據）來收集證據，並最終支持診斷決策。為解決這一問題，我們提出MEDOPENCLAW——一個可審計的運行時框架，使視覺語言模型能在標準醫學工具或查看器（如3D Slicer）中動態操作。在此基礎上，我們進一步推出MEDFLOWBENCH，這是一個涵蓋多序列腦部MRI與肺部CT/PET的全流程醫學影像基準測試，通過僅查看器、工具使用和開放方法三條賽道系統性評估醫學智能體能力。初步結果揭示關鍵發現：儘管尖端大型語言模型/視覺語言模型（如Gemini 3.1 Pro和GPT-5.4）能成功操控查看器完成基礎研究層級任務，但當獲得專業輔助工具使用權時，其表現反而因缺乏精確空間定位能力而下降。通過彌補靜態圖像感知與交互式臨床工作流之間的鴻溝，MEDOPENCLAW與MEDFLOWBENCH為開發可審計的全流程醫學影像智能體奠定了可重現的基礎。

English

Currently, evaluating vision-language models (VLMs) in medical imaging tasks oversimplifies clinical reality by relying on pre-selected 2D images that demand significant manual labor to curate. This setup misses the core challenge of realworld diagnostics: a true clinical agent must actively navigate full 3D volumes across multiple sequences or modalities to gather evidence and ultimately support a final decision. To address this, we propose MEDOPENCLAW, an auditable runtime designed to let VLMs operate dynamically within standard medical tools or viewers (e.g., 3D Slicer). On top of this runtime, we introduce MEDFLOWBENCH, a full-study medical imaging benchmark covering multi-sequence brain MRI and lung CT/PET. It systematically evaluates medical agentic capabilities across viewer-only, tool-use, and open-method tracks. Initial results reveal a critical insight: while state-of-the-art LLMs/VLMs (e.g., Gemini 3.1 Pro and GPT-5.4) can successfully navigate the viewer to solve basic study-level tasks, their performance paradoxically degrades when given access to professional support tools due to a lack of precise spatial grounding. By bridging the gap between static-image perception and interactive clinical workflows, MEDOPENCLAW and MEDFLOWBENCH establish a reproducible foundation for developing auditable, full-study medical imaging agents.

MedOpenClaw：基於未經整理完整影像研究的可審計醫學影像智能體推理系統

MedOpenClaw: Auditable Medical Imaging Agents Reasoning over Uncurated Full Studies

摘要

Support