DoctorAgent-RL: マルチターン臨床対話のためのマルチエージェント協調型強化学習システム

要旨

大規模言語モデル（LLM）は、生物医学的な質問応答の分野で優れた能力を発揮しているが、実際の臨床相談への応用においては依然として中核的な課題に直面している。既存のシステムは、患者が症状を一度のやり取りで完全に説明しなければならない一方向の情報伝達モードに依存しており、訴えが曖昧な場合には非特異的な診断推奨がなされることが多い。従来の教師あり学習に基づく多ターン対話手法は、静的なデータ駆動型パラダイムに制約され、汎用性に欠け、臨床的に重要な情報を知的に抽出することが困難である。これらの制約を解決するため、我々はDoctorAgent-RLを提案する。これは、強化学習（RL）に基づくマルチエージェント協調フレームワークであり、医療相談を不確実性下での動的な意思決定プロセスとしてモデル化する。医師エージェントは、患者エージェントとの多ターン対話を通じてRLフレームワーク内で質問戦略を継続的に最適化し、相談評価者からの包括的な報酬に基づいて情報収集の経路を動的に調整する。このRL微調整メカニズムにより、LLMは既存の対話データのパターンを表面的に模倣するのではなく、臨床推論ロジックに沿ったインタラクション戦略を自律的に開発することが可能となる。特に、我々は患者インタラクションをシミュレート可能な初の英語多ターン医療相談データセットであるMTMedDialogを構築した。実験の結果、DoctorAgent-RLは多ターン推論能力と最終的な診断性能の両面で既存のモデルを上回り、臨床相談の支援における実用的な価値を示している。https://github.com/JarvisUSTC/DoctorAgent-RL

English

Large language models (LLMs) have demonstrated excellent capabilities in the field of biomedical question answering, but their application in real-world clinical consultations still faces core challenges. Existing systems rely on a one-way information transmission mode where patients must fully describe their symptoms in a single round, leading to nonspecific diagnostic recommendations when complaints are vague. Traditional multi-turn dialogue methods based on supervised learning are constrained by static data-driven paradigms, lacking generalizability and struggling to intelligently extract key clinical information. To address these limitations, we propose DoctorAgent-RL, a reinforcement learning (RL)-based multi-agent collaborative framework that models medical consultations as a dynamic decision-making process under uncertainty. The doctor agent continuously optimizes its questioning strategy within the RL framework through multi-turn interactions with the patient agent, dynamically adjusting its information-gathering path based on comprehensive rewards from the Consultation Evaluator. This RL fine-tuning mechanism enables LLMs to autonomously develop interaction strategies aligned with clinical reasoning logic, rather than superficially imitating patterns in existing dialogue data. Notably, we constructed MTMedDialog, the first English multi-turn medical consultation dataset capable of simulating patient interactions. Experiments demonstrate that DoctorAgent-RL outperforms existing models in both multi-turn reasoning capability and final diagnostic performance, demonstrating practical value in assisting clinical consultations. https://github.com/JarvisUSTC/DoctorAgent-RL

DoctorAgent-RL: マルチターン臨床対話のためのマルチエージェント協調型強化学習システム

DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue

要旨

Support