RadAgent: 胸部コンピュータ断層撮影の段階的解釈のためのツール利用型AIエージェント

要旨

視覚言語モデル（VLM）は、コンピュータ断層撮影（CT）などの複雑な医用画像のAI駆動による解釈とレポート作成において著しい進歩を遂げてきた。しかし、既存の手法では、臨床医は最終出力を単に受動的に観察する立場に留まり、検証や修正が可能な解釈可能な推論過程を提供していない。この問題に対処するため、我々は段階的で解釈可能なプロセスを通じてCTレポートを生成するツール利用型AIエージェント「RadAgent」を提案する。生成される各レポートには、中間決定とツール相互作用の完全に検証可能なトレースが付随し、臨床医は報告された所見がどのように導出されたかを確認できる。実験では、RadAgentが胸部CTレポート生成において、3D VLMベースの比較手法であるCT-Chatと比較して3つの次元で改善が見られることを確認した。臨床的精度では、マクロF1スコアが6.0ポイント（36.4%相当）、マイクロF1スコアが5.4ポイント（19.6%相当）向上した。敵対的条件下でのロバスト性は24.7ポイント（41.9%相当）向上した。さらに、RadAgentは忠実性において37.0%を達成し、これは3D VLMベース手法では完全に欠如していた新たな能力である。胸部CTの解釈を明示的でツール拡張された反復的推論過程として構造化することにより、RadAgentは放射線医学における透明性と信頼性の高いAIの実現に貢献する。

English

Vision-language models (VLM) have markedly advanced AI-driven interpretation and reporting of complex medical imaging, such as computed tomography (CT). Yet, existing methods largely relegate clinicians to passive observers of final outputs, offering no interpretable reasoning trace for them to inspect, validate, or refine. To address this, we introduce RadAgent, a tool-using AI agent that generates CT reports through a stepwise and interpretable process. Each resulting report is accompanied by a fully inspectable trace of intermediate decisions and tool interactions, allowing clinicians to examine how the reported findings are derived. In our experiments, we observe that RadAgent improves Chest CT report generation over its 3D VLM counterpart, CT-Chat, across three dimensions. Clinical accuracy improves by 6.0 points (36.4% relative) in macro-F1 and 5.4 points (19.6% relative) in micro-F1. Robustness under adversarial conditions improves by 24.7 points (41.9% relative). Furthermore, RadAgent achieves 37.0% in faithfulness, a new capability entirely absent in its 3D VLM counterpart. By structuring the interpretation of chest CT as an explicit, tool-augmented and iterative reasoning trace, RadAgent brings us closer toward transparent and reliable AI for radiology.

RadAgent: 胸部コンピュータ断層撮影の段階的解釈のためのツール利用型AIエージェント

RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography

要旨

Support