MedReseacher-R1: 知識に基づく軌跡合成フレームワークによるエキスパートレベルの医療深層研究

要旨

大規模言語モデル（LLM）ベースのエージェントの最近の進展は、複数の領域にわたる印象的な能力を示しており、複雑な情報探索と統合タスクにおいて優れた性能を発揮する深層研究システムがその一例である。汎用の深層研究エージェントは印象的な能力を示しているものの、医療領域の課題には大きく苦戦しており、主要なプロプライエタリシステムが複雑な医療ベンチマークで限定的な精度しか達成できていないことがその証左である。主な制約は以下の2点である：（1）モデルが臨床推論に必要な十分な密度の医療知識を欠いていること、（2）医療文脈に特化した検索ツールの不在によりフレームワークが制約を受けていること。本論文では、これらの課題に対処する医療深層研究エージェントを提案する。第一に、医療知識グラフを用いた新たなデータ合成フレームワークを開発し、希少な医療エンティティ周辺のサブグラフから最長の連鎖を抽出して複雑なマルチホップの質問-回答ペアを生成する。第二に、汎用ツールに加えて、カスタムビルドのプライベート医療検索エンジンを統合し、正確な医療情報の統合を可能にする。我々のアプローチは、12の医療専門分野にわたる2100以上の多様な軌跡を生成し、各軌跡は平均4.2回のツール相互作用を伴う。教師ありファインチューニングと複合報酬を用いたオンライン強化学習を組み合わせた2段階のトレーニングパラダイムを通じて、MedResearcher-R1-32Bモデルは医療ベンチマークにおいて新たな最先端の結果を達成し、一般的な深層研究タスクにおいても競争力のある性能を維持する。我々の研究は、アーキテクチャ、ツール設計、トレーニングデータ構築における戦略的なドメイン特化のイノベーションにより、小規模なオープンソースモデルが専門領域においてはるかに大規模なプロプライエタリシステムを凌駕し得ることを示している。

English

Recent developments in Large Language Model (LLM)-based agents have shown impressive capabilities spanning multiple domains, exemplified by deep research systems that demonstrate superior performance on complex information-seeking and synthesis tasks. While general-purpose deep research agents have shown impressive capabilities, they struggle significantly with medical domain challenges, as evidenced by leading proprietary systems achieving limited accuracy on complex medical benchmarks. The key limitations are: (1) the model lacks sufficient dense medical knowledge for clinical reasoning, and (2) the framework is constrained by the absence of specialized retrieval tools tailored for medical contexts.We present a medical deep research agent that addresses these challenges through two core innovations. First, we develop a novel data synthesis framework using medical knowledge graphs, extracting the longest chains from subgraphs around rare medical entities to generate complex multi-hop question-answer pairs. Second, we integrate a custom-built private medical retrieval engine alongside general-purpose tools, enabling accurate medical information synthesis. Our approach generates 2100+ diverse trajectories across 12 medical specialties, each averaging 4.2 tool interactions.Through a two-stage training paradigm combining supervised fine-tuning and online reinforcement learning with composite rewards, our MedResearcher-R1-32B model demonstrates exceptional performance, establishing new state-of-the-art results on medical benchmarks while maintaining competitive performance on general deep research tasks. Our work demonstrates that strategic domain-specific innovations in architecture, tool design, and training data construction can enable smaller open-source models to outperform much larger proprietary systems in specialized domains.

MedReseacher-R1: 知識に基づく軌跡合成フレームワークによるエキスパートレベルの医療深層研究

MedReseacher-R1: Expert-Level Medical Deep Researcher via A Knowledge-Informed Trajectory Synthesis Framework

要旨

Support