DoctorAgent-RL: 다중 턴 임상 대화를 위한 다중 에이전트 협업 강화 학습 시스템

초록

대형 언어 모델(LLMs)은 생물의학 질문 응답 분야에서 뛰어난 능력을 보여주었지만, 실제 임상 상담에의 적용은 여전히 핵심적인 도전 과제에 직면해 있습니다. 기존 시스템은 환자가 한 번에 모든 증상을 완전히 설명해야 하는 일방향 정보 전달 모드에 의존하고 있어, 불분명한 증상 호소 시 비특이적인 진단 권고로 이어지는 문제가 있습니다. 전통적인 지도 학습 기반의 다중 턴 대화 방식은 정적인 데이터 중심 패러다임에 제약을 받아 일반화 능력이 부족하고, 핵심 임상 정보를 지능적으로 추출하는 데 어려움을 겪습니다. 이러한 한계를 해결하기 위해, 우리는 DoctorAgent-RL을 제안합니다. 이는 강화 학습(RL) 기반의 다중 에이전트 협업 프레임워크로, 의료 상담을 불확실성 하의 동적 의사결정 과정으로 모델링합니다. 의사 에이전트는 환자 에이전트와의 다중 턴 상호작용을 통해 RL 프레임워크 내에서 질문 전략을 지속적으로 최적화하며, 상담 평가자(Consultation Evaluator)의 종합적 보상에 기반해 정보 수집 경로를 동적으로 조정합니다. 이 RL 미세 조정 메커니즘은 LLMs가 기존 대화 데이터의 패턴을 표면적으로 모방하는 대신, 임상 추론 논리에 부합하는 상호작용 전략을 자율적으로 개발할 수 있게 합니다. 특히, 우리는 환자 상호작용을 시뮬레이션할 수 있는 최초의 영어 다중 턴 의료 상담 데이터셋인 MTMedDialog를 구축했습니다. 실험 결과, DoctorAgent-RL은 다중 턴 추론 능력과 최종 진단 성능 모두에서 기존 모델들을 능가하며, 임상 상담 지원에 있어 실용적 가치를 입증했습니다. https://github.com/JarvisUSTC/DoctorAgent-RL

English

Large language models (LLMs) have demonstrated excellent capabilities in the field of biomedical question answering, but their application in real-world clinical consultations still faces core challenges. Existing systems rely on a one-way information transmission mode where patients must fully describe their symptoms in a single round, leading to nonspecific diagnostic recommendations when complaints are vague. Traditional multi-turn dialogue methods based on supervised learning are constrained by static data-driven paradigms, lacking generalizability and struggling to intelligently extract key clinical information. To address these limitations, we propose DoctorAgent-RL, a reinforcement learning (RL)-based multi-agent collaborative framework that models medical consultations as a dynamic decision-making process under uncertainty. The doctor agent continuously optimizes its questioning strategy within the RL framework through multi-turn interactions with the patient agent, dynamically adjusting its information-gathering path based on comprehensive rewards from the Consultation Evaluator. This RL fine-tuning mechanism enables LLMs to autonomously develop interaction strategies aligned with clinical reasoning logic, rather than superficially imitating patterns in existing dialogue data. Notably, we constructed MTMedDialog, the first English multi-turn medical consultation dataset capable of simulating patient interactions. Experiments demonstrate that DoctorAgent-RL outperforms existing models in both multi-turn reasoning capability and final diagnostic performance, demonstrating practical value in assisting clinical consultations. https://github.com/JarvisUSTC/DoctorAgent-RL