MedAgent-Pro: 추론 에이전트 워크플로우를 통한 다중 모드 증거 기반 의료 진단을 향하여

초록

다양한 의료 진단 분야에서 인간 임상의를 보조할 수 있는 신뢰할 수 있는 AI 시스템을 개발하는 것은 오랫동안 연구자들의 주요 목표로 여겨져 왔습니다. 최근, 다중 모드 대형 언어 모델(MLLMs)이 다양한 분야에서 주목받으며 성공을 거두고 있습니다. 강력한 추론 능력과 사용자 지시에 따라 다양한 작업을 수행할 수 있는 능력을 갖춘 MLLMs는 의료 진단을 향상시킬 수 있는 큰 잠재력을 가지고 있습니다. 그러나 MLLMs를 의료 분야에 직접 적용하는 것은 여전히 도전 과제로 남아 있습니다. MLLMs는 시각적 입력에 대한 세부적인 인식이 부족하여 의료 진단에 필수적인 정량적 이미지 분석을 수행하는 데 한계가 있습니다. 또한, MLLMs는 종종 환각과 추론의 불일치를 보이는 반면, 임상 진단은 엄격하게 정해진 기준을 따라야 합니다. 이러한 문제를 해결하기 위해, 우리는 신뢰할 수 있고 설명 가능하며 정확한 의료 진단을 달성하기 위해 설계된 증거 기반 추론 에이전트 시스템인 MedAgent-Pro를 제안합니다. 이는 계층적 워크플로우를 통해 이루어집니다: 작업 수준에서는 지식 기반 추론이 특정 질병에 대한 신뢰할 수 있는 진단 계획을 검색된 임상 기준에 따라 생성합니다. 반면, 사례 수준에서는 여러 도구 에이전트가 다중 모드 입력을 처리하고, 계획에 따라 다양한 지표를 분석하며, 정량적 및 정성적 증거를 기반으로 최종 진단을 제공합니다. 2D 및 3D 의료 진단 작업에 대한 포괄적인 실험은 MedAgent-Pro의 우수성과 효과를 입증하며, 사례 연구는 그 신뢰성과 해석 가능성을 더욱 강조합니다. 코드는 https://github.com/jinlab-imvr/MedAgent-Pro에서 확인할 수 있습니다.

English

Developing reliable AI systems to assist human clinicians in multi-modal medical diagnosis has long been a key objective for researchers. Recently, Multi-modal Large Language Models (MLLMs) have gained significant attention and achieved success across various domains. With strong reasoning capabilities and the ability to perform diverse tasks based on user instructions, they hold great potential for enhancing medical diagnosis. However, directly applying MLLMs to the medical domain still presents challenges. They lack detailed perception of visual inputs, limiting their ability to perform quantitative image analysis, which is crucial for medical diagnostics. Additionally, MLLMs often exhibit hallucinations and inconsistencies in reasoning, whereas clinical diagnoses must adhere strictly to established criteria. To address these challenges, we propose MedAgent-Pro, an evidence-based reasoning agentic system designed to achieve reliable, explainable, and precise medical diagnoses. This is accomplished through a hierarchical workflow: at the task level, knowledge-based reasoning generate reliable diagnostic plans for specific diseases following retrieved clinical criteria. While at the case level, multiple tool agents process multi-modal inputs, analyze different indicators according to the plan, and provide a final diagnosis based on both quantitative and qualitative evidence. Comprehensive experiments on both 2D and 3D medical diagnosis tasks demonstrate the superiority and effectiveness of MedAgent-Pro, while case studies further highlight its reliability and interpretability. The code is available at https://github.com/jinlab-imvr/MedAgent-Pro.

MedAgent-Pro: 추론 에이전트 워크플로우를 통한 다중 모드 증거 기반 의료 진단을 향하여

MedAgent-Pro: Towards Multi-modal Evidence-based Medical Diagnosis via Reasoning Agentic Workflow

초록

Support