RadAgent：一种基于工具调用的胸部计算机断层扫描分步解读AI代理

摘要

視覺語言模型（VLM）在人工智慧驅動的複雜醫學影像（如電腦斷層掃描CT）解讀與報告生成方面取得顯著進展。然而，現有方法大多將臨床醫師置於最終輸出的被動觀察者角色，未能提供可檢驗、驗證或修正的解釋性推理軌跡。為此，我們提出RadAgent——一個運用工具的人工智慧代理，通過逐步可解釋的流程生成CT報告。每份生成報告均附帶完全可審查的中間決策與工具互動軌跡，使臨床醫師能追溯報告結論的推導過程。實驗結果顯示，RadAgent在三個維度上優於其三維VLM對照模型CT-Chat的胸部CT報告生成能力：臨床準確性在宏觀F1分數提升6.0分（相對提升36.4%），微觀F1分數提升5.4分（相對提升19.6%）；對抗條件下的魯棒性提升24.7分（相對提升41.9%）；此外，RadAgent在忠實度指標達到37.0%，此為其三維VLM對照模型完全缺失的新能力。通過將胸部CT解讀結構化為顯式、工具增強且迭代的推理軌跡，RadAgent推動放射學領域向透明可靠的人工智慧邁進關鍵一步。

English

Vision-language models (VLM) have markedly advanced AI-driven interpretation and reporting of complex medical imaging, such as computed tomography (CT). Yet, existing methods largely relegate clinicians to passive observers of final outputs, offering no interpretable reasoning trace for them to inspect, validate, or refine. To address this, we introduce RadAgent, a tool-using AI agent that generates CT reports through a stepwise and interpretable process. Each resulting report is accompanied by a fully inspectable trace of intermediate decisions and tool interactions, allowing clinicians to examine how the reported findings are derived. In our experiments, we observe that RadAgent improves Chest CT report generation over its 3D VLM counterpart, CT-Chat, across three dimensions. Clinical accuracy improves by 6.0 points (36.4% relative) in macro-F1 and 5.4 points (19.6% relative) in micro-F1. Robustness under adversarial conditions improves by 24.7 points (41.9% relative). Furthermore, RadAgent achieves 37.0% in faithfulness, a new capability entirely absent in its 3D VLM counterpart. By structuring the interpretation of chest CT as an explicit, tool-augmented and iterative reasoning trace, RadAgent brings us closer toward transparent and reliable AI for radiology.

RadAgent：一种基于工具调用的胸部计算机断层扫描分步解读AI代理

RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography

摘要

Support