DoctorAgent-RL：一個多代理協同強化學習系統，用於多輪臨床對話

摘要

大型語言模型（LLMs）在生物醫學問答領域展現了卓越的能力，但其在實際臨床諮詢中的應用仍面臨核心挑戰。現有系統依賴於單向信息傳輸模式，患者需在一輪對話中完整描述症狀，當主訴模糊時，往往導致診斷建議缺乏針對性。基於監督學習的傳統多輪對話方法受限於靜態數據驅動範式，缺乏泛化能力，難以智能提取關鍵臨床信息。為解決這些限制，我們提出了DoctorAgent-RL，這是一個基於強化學習（RL）的多智能體協作框架，將醫療諮詢建模為不確定性下的動態決策過程。醫生智能體通過與患者智能體的多輪互動，在RL框架內持續優化其提問策略，並根據諮詢評估器提供的綜合獎勵動態調整信息收集路徑。這種RL微調機制使LLMs能夠自主開發符合臨床推理邏輯的互動策略，而非僅僅模仿現有對話數據中的模式。值得注意的是，我們構建了MTMedDialog，這是首個能夠模擬患者互動的英文多輪醫療諮詢數據集。實驗表明，DoctorAgent-RL在多輪推理能力和最終診斷性能上均優於現有模型，展現了在輔助臨床諮詢中的實用價值。 https://github.com/JarvisUSTC/DoctorAgent-RL

English

Large language models (LLMs) have demonstrated excellent capabilities in the field of biomedical question answering, but their application in real-world clinical consultations still faces core challenges. Existing systems rely on a one-way information transmission mode where patients must fully describe their symptoms in a single round, leading to nonspecific diagnostic recommendations when complaints are vague. Traditional multi-turn dialogue methods based on supervised learning are constrained by static data-driven paradigms, lacking generalizability and struggling to intelligently extract key clinical information. To address these limitations, we propose DoctorAgent-RL, a reinforcement learning (RL)-based multi-agent collaborative framework that models medical consultations as a dynamic decision-making process under uncertainty. The doctor agent continuously optimizes its questioning strategy within the RL framework through multi-turn interactions with the patient agent, dynamically adjusting its information-gathering path based on comprehensive rewards from the Consultation Evaluator. This RL fine-tuning mechanism enables LLMs to autonomously develop interaction strategies aligned with clinical reasoning logic, rather than superficially imitating patterns in existing dialogue data. Notably, we constructed MTMedDialog, the first English multi-turn medical consultation dataset capable of simulating patient interactions. Experiments demonstrate that DoctorAgent-RL outperforms existing models in both multi-turn reasoning capability and final diagnostic performance, demonstrating practical value in assisting clinical consultations. https://github.com/JarvisUSTC/DoctorAgent-RL

DoctorAgent-RL：一個多代理協同強化學習系統，用於多輪臨床對話

DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue

摘要

Support