MedAgent-Pro：推論エージェントワークフローによるマルチモーダルエビデンスベース医療診断に向けて

要旨

多モーダル医療診断において人間の臨床医を支援する信頼性の高いAIシステムの開発は、長年にわたり研究者たちの主要な目標となってきた。最近では、多モーダル大規模言語モデル（MLLMs）が注目を集め、さまざまな分野で成功を収めている。ユーザーの指示に基づいて多様なタスクを実行する強力な推論能力を備えており、医療診断の向上に大きな可能性を秘めている。しかし、MLLMsを医療分野に直接適用するにはまだ課題がある。視覚的入力を詳細に認識する能力が不足しており、医療診断に不可欠な定量的画像分析を実行する能力が制限されている。さらに、MLLMsはしばしば幻覚や推論の不整合を示すが、臨床診断は確立された基準に厳密に従わなければならない。これらの課題に対処するため、我々はMedAgent-Proを提案する。これは、信頼性が高く説明可能で正確な医療診断を実現するためのエビデンスベースの推論エージェントシステムである。これは階層的なワークフローを通じて達成される：タスクレベルでは、知識ベースの推論が特定の疾患に対する信頼性の高い診断計画を臨床基準に従って生成する。一方、ケースレベルでは、複数のツールエージェントが多モーダル入力を処理し、計画に従って異なる指標を分析し、定量的および定性的なエビデンスに基づいて最終診断を提供する。2Dおよび3D医療診断タスクにおける包括的な実験は、MedAgent-Proの優位性と有効性を実証し、ケーススタディはその信頼性と解釈可能性をさらに強調している。コードはhttps://github.com/jinlab-imvr/MedAgent-Proで公開されている。

English

Developing reliable AI systems to assist human clinicians in multi-modal medical diagnosis has long been a key objective for researchers. Recently, Multi-modal Large Language Models (MLLMs) have gained significant attention and achieved success across various domains. With strong reasoning capabilities and the ability to perform diverse tasks based on user instructions, they hold great potential for enhancing medical diagnosis. However, directly applying MLLMs to the medical domain still presents challenges. They lack detailed perception of visual inputs, limiting their ability to perform quantitative image analysis, which is crucial for medical diagnostics. Additionally, MLLMs often exhibit hallucinations and inconsistencies in reasoning, whereas clinical diagnoses must adhere strictly to established criteria. To address these challenges, we propose MedAgent-Pro, an evidence-based reasoning agentic system designed to achieve reliable, explainable, and precise medical diagnoses. This is accomplished through a hierarchical workflow: at the task level, knowledge-based reasoning generate reliable diagnostic plans for specific diseases following retrieved clinical criteria. While at the case level, multiple tool agents process multi-modal inputs, analyze different indicators according to the plan, and provide a final diagnosis based on both quantitative and qualitative evidence. Comprehensive experiments on both 2D and 3D medical diagnosis tasks demonstrate the superiority and effectiveness of MedAgent-Pro, while case studies further highlight its reliability and interpretability. The code is available at https://github.com/jinlab-imvr/MedAgent-Pro.

MedAgent-Pro：推論エージェントワークフローによるマルチモーダルエビデンスベース医療診断に向けて

MedAgent-Pro: Towards Multi-modal Evidence-based Medical Diagnosis via Reasoning Agentic Workflow

要旨

Support