MedReseacher-R1:基於知識引導軌跡合成框架的專家級醫學深度研究系統
MedReseacher-R1: Expert-Level Medical Deep Researcher via A Knowledge-Informed Trajectory Synthesis Framework
August 20, 2025
作者: Ailing Yu, Lan Yao, Jingnan Liu, Zhe Chen, Jiajun Yin, Yuan Wang, Xinhao Liao, Zhiling Ye, Ji Li, Yun Yue, Hansong Xiao, Hualei Zhou, Chunxiao Guo, Peng Wei, Jinjie Gu
cs.AI
摘要
近期,基于大型语言模型(LLM)的智能体在多个领域展现了令人瞩目的能力,尤其是在深度研究系统中,这些系统在复杂的信息检索与综合任务上表现出色。尽管通用型深度研究智能体展现了强大的功能,但在医学领域的挑战面前却显得力不从心,这一点从领先的专有系统在复杂医学基准测试中仅取得有限准确率可见一斑。其主要局限在于:(1)模型缺乏足够的密集医学知识以支持临床推理;(2)框架因缺少专为医学情境设计的检索工具而受限。我们提出了一种医学深度研究智能体,通过两项核心创新应对这些挑战。首先,我们开发了一种基于医学知识图谱的新型数据合成框架,通过从罕见医学实体周围的子图中提取最长链,生成复杂的多跳问答对。其次,我们整合了一个定制化的私有医学检索引擎与通用工具,实现了精准的医学信息综合。我们的方法在12个医学专科中产生了2100多条多样化轨迹,每条轨迹平均涉及4.2次工具交互。通过结合监督微调与在线强化学习的两阶段训练范式,并采用复合奖励机制,我们的MedResearcher-R1-32B模型展现了卓越性能,在医学基准测试中创下了新的最先进成果,同时在通用深度研究任务上保持了竞争力。我们的工作表明,在架构、工具设计及训练数据构建方面实施针对特定领域的战略创新,能够使较小的开源模型在专业领域超越规模更大的专有系统。
English
Recent developments in Large Language Model (LLM)-based agents have shown
impressive capabilities spanning multiple domains, exemplified by deep research
systems that demonstrate superior performance on complex information-seeking
and synthesis tasks. While general-purpose deep research agents have shown
impressive capabilities, they struggle significantly with medical domain
challenges, as evidenced by leading proprietary systems achieving limited
accuracy on complex medical benchmarks. The key limitations are: (1) the model
lacks sufficient dense medical knowledge for clinical reasoning, and (2) the
framework is constrained by the absence of specialized retrieval tools tailored
for medical contexts.We present a medical deep research agent that addresses
these challenges through two core innovations. First, we develop a novel data
synthesis framework using medical knowledge graphs, extracting the longest
chains from subgraphs around rare medical entities to generate complex
multi-hop question-answer pairs. Second, we integrate a custom-built private
medical retrieval engine alongside general-purpose tools, enabling accurate
medical information synthesis. Our approach generates 2100+ diverse
trajectories across 12 medical specialties, each averaging 4.2 tool
interactions.Through a two-stage training paradigm combining supervised
fine-tuning and online reinforcement learning with composite rewards, our
MedResearcher-R1-32B model demonstrates exceptional performance, establishing
new state-of-the-art results on medical benchmarks while maintaining
competitive performance on general deep research tasks. Our work demonstrates
that strategic domain-specific innovations in architecture, tool design, and
training data construction can enable smaller open-source models to outperform
much larger proprietary systems in specialized domains.