MedOpenClaw：非精選フルスタディを推論する監査可能な医療画像エージェント

要旨

現在、医療画像タスクにおける視覚言語モデル（VLM）の評価は、多大な手作業を要する事前選択された2D画像に依存しているため、臨床現場の現実を過度に単純化しています。この設定は、現実の診断における核心的な課題を見落としています。真の臨床エージェントは、最終的な判断を下すために証拠を収集するため、複数のシーケンスやモダリティにわたる完全な3Dボリューム内を能動的に探索しなければなりません。この問題に対処するため、我々はMEDOPENCLAWを提案します。これは監査可能なランタイムであり、VLMが標準的な医療ツールやビューア（例：3D Slicer）内で動的に動作することを可能にします。このランタイム上に、マルチシーケンス脳MRIおよび肺CT/PETをカバーするフルスタディ医療画像ベンチマークであるMEDFLOWBENCHを導入します。これは、ビューア限定、ツール使用、オープンメソッドのトラックにわたって、医療エージェント能力を体系的に評価します。初期結果から重要な知見が明らかになりました。最先端のLLM/VLM（例：Gemini 3.1 ProやGPT-5.4）は、基本的なスタディレベルのタスクを解決するためにビューアを操作することには成功するものの、専門的な支援ツールへのアクセス権を与えられると、空間的基盤の精度不足により、逆説的に性能が低下するのです。静止画像の知覚と対話型臨床ワークフローとの間のギャップを埋めることで、MEDOPENCLAWとMEDFLOWBENCHは、監査可能でフルスタディの医療画像エージェントを開発するための再現性のある基盤を確立します。

English

Currently, evaluating vision-language models (VLMs) in medical imaging tasks oversimplifies clinical reality by relying on pre-selected 2D images that demand significant manual labor to curate. This setup misses the core challenge of realworld diagnostics: a true clinical agent must actively navigate full 3D volumes across multiple sequences or modalities to gather evidence and ultimately support a final decision. To address this, we propose MEDOPENCLAW, an auditable runtime designed to let VLMs operate dynamically within standard medical tools or viewers (e.g., 3D Slicer). On top of this runtime, we introduce MEDFLOWBENCH, a full-study medical imaging benchmark covering multi-sequence brain MRI and lung CT/PET. It systematically evaluates medical agentic capabilities across viewer-only, tool-use, and open-method tracks. Initial results reveal a critical insight: while state-of-the-art LLMs/VLMs (e.g., Gemini 3.1 Pro and GPT-5.4) can successfully navigate the viewer to solve basic study-level tasks, their performance paradoxically degrades when given access to professional support tools due to a lack of precise spatial grounding. By bridging the gap between static-image perception and interactive clinical workflows, MEDOPENCLAW and MEDFLOWBENCH establish a reproducible foundation for developing auditable, full-study medical imaging agents.

MedOpenClaw：非精選フルスタディを推論する監査可能な医療画像エージェント

MedOpenClaw: Auditable Medical Imaging Agents Reasoning over Uncurated Full Studies

要旨

Support