MedReseacher-R1: 지식 기반 궤적 합성 프레임워크를 통한 전문가 수준의 의료 심층 연구자

초록

대규모 언어 모델(LLM) 기반 에이전트의 최근 발전은 복잡한 정보 탐색 및 통합 작업에서 우수한 성능을 보이는 심층 연구 시스템을 통해 여러 분야에 걸쳐 인상적인 역량을 보여주고 있습니다. 범용 심층 연구 에이전트는 인상적인 능력을 보여주었지만, 주요 독점 시스템이 복잡한 의료 벤치마크에서 제한된 정확도를 보이는 것에서 알 수 있듯이 의료 분야의 도전 과제에는 상당한 어려움을 겪고 있습니다. 주요 한계점은 다음과 같습니다: (1) 모델이 임상 추론을 위한 충분한 밀집 의료 지식을 갖추지 못했고, (2) 의료 맥락에 맞춤화된 전문 검색 도구의 부재로 프레임워크가 제약을 받고 있습니다. 우리는 이러한 과제를 해결하기 위해 두 가지 핵심 혁신을 통해 의료 심층 연구 에이전트를 제시합니다. 첫째, 의료 지식 그래프를 사용한 새로운 데이터 합성 프레임워크를 개발하여 희귀 의료 개체 주변의 하위 그래프에서 가장 긴 체인을 추출하여 복잡한 다중 홉 질문-답변 쌍을 생성합니다. 둘째, 범용 도구와 함께 맞춤형 개인 의료 검색 엔진을 통합하여 정확한 의료 정보 통합을 가능하게 합니다. 우리의 접근 방식은 12개의 의료 전문 분야에 걸쳐 2100개 이상의 다양한 트래젝토리를 생성하며, 각각 평균 4.2개의 도구 상호작용을 포함합니다. 지도 미세 조정과 복합 보상을 통한 온라인 강화 학습을 결합한 두 단계 훈련 패러다임을 통해, 우리의 MedResearcher-R1-32B 모델은 의료 벤치마크에서 새로운 최첨단 결과를 달성하면서도 일반 심층 연구 작업에서도 경쟁력 있는 성능을 유지합니다. 우리의 작업은 아키텍처, 도구 설계, 훈련 데이터 구축에서의 전략적인 도메인 특화 혁신이 더 작은 오픈소스 모델이 특수 분야에서 훨씬 더 큰 독점 시스템을 능가할 수 있게 할 수 있음을 보여줍니다.

English

Recent developments in Large Language Model (LLM)-based agents have shown impressive capabilities spanning multiple domains, exemplified by deep research systems that demonstrate superior performance on complex information-seeking and synthesis tasks. While general-purpose deep research agents have shown impressive capabilities, they struggle significantly with medical domain challenges, as evidenced by leading proprietary systems achieving limited accuracy on complex medical benchmarks. The key limitations are: (1) the model lacks sufficient dense medical knowledge for clinical reasoning, and (2) the framework is constrained by the absence of specialized retrieval tools tailored for medical contexts.We present a medical deep research agent that addresses these challenges through two core innovations. First, we develop a novel data synthesis framework using medical knowledge graphs, extracting the longest chains from subgraphs around rare medical entities to generate complex multi-hop question-answer pairs. Second, we integrate a custom-built private medical retrieval engine alongside general-purpose tools, enabling accurate medical information synthesis. Our approach generates 2100+ diverse trajectories across 12 medical specialties, each averaging 4.2 tool interactions.Through a two-stage training paradigm combining supervised fine-tuning and online reinforcement learning with composite rewards, our MedResearcher-R1-32B model demonstrates exceptional performance, establishing new state-of-the-art results on medical benchmarks while maintaining competitive performance on general deep research tasks. Our work demonstrates that strategic domain-specific innovations in architecture, tool design, and training data construction can enable smaller open-source models to outperform much larger proprietary systems in specialized domains.

MedReseacher-R1: 지식 기반 궤적 합성 프레임워크를 통한 전문가 수준의 의료 심층 연구자

MedReseacher-R1: Expert-Level Medical Deep Researcher via A Knowledge-Informed Trajectory Synthesis Framework

초록

Support