OpenSeeker-v2: 정보성과 고난이도 궤적을 통해 검색 에이전트의 한계를 확장하다

초록

딥 서치 능력은 최첨단 대규모 언어 모델(LLM) 에이전트에게 필수적인 역량이 되었지만, 그 개발은 여전히 산업계 거대 기업들이 주도하고 있습니다. 일반적인 산업계 방식은 사전 학습, 지속적 사전 학습(CPT), 지도 미세 조정(SFT), 강화 학습(RL)에 이르는 매우 자원 집약적인 파이프라인을 수반합니다. 본 보고서에서는 정보가 풍부하고 고난이도의 트랙젝토리로 충전될 때, 단순한 SFT 접근법이 최첨단 검색 에이전트 훈련에 놀랍도록 강력할 수 있음을 보여줍니다. 지식 그래프 규모 확장을 통한 풍부한 탐색, 더 넓은 기능성을 위한 도구 세트 규모 확대, 엄격한 저-스텝 필터링이라는 세 가지 간단한 데이터 합성 수정을 도입하여 더 강력한 기준선을 확립했습니다. 단 10.6k개의 데이터 포인트로 훈련된 우리의 OpenSeeker-v2는 4개 벤치마크(ReAct 패러다임을 사용한 30B 규모 에이전트)에서 최첨단 성능을 달성했습니다: BrowseComp에서 46.0%, BrowseComp-ZH에서 58.1%, Humanity's Last Exam에서 34.6%, xbench에서 78.0%를 기록하여 무거운 CPT+SFT+RL 파이프라인으로 훈련된 Tongyi DeepResearch의 각각 43.4%, 46.7%, 32.9%, 75.0%를 능가했습니다. 특히 OpenSeeker-v2는 해당 모델 규모와 패러다임 내에서 순수 학계 팀이 오직 SFT만을 사용하여 개발한 최초의 최첨단 검색 에이전트를 의미합니다. 우리는 OpenSeeker-v2 모델 가중치를 오픈소스로 공개하고 이 간단하지만 효과적인 발견을 공유하여 최첨단 검색 에이전트 연구가 커뮤니티에 더욱 접근 가능해지게 된 것에 대해 기쁘게 생각합니다.

English

Deep search capabilities have become an indispensable competency for frontier Large Language Model (LLM) agents, yet their development remains dominated by industrial giants. The typical industry recipe involves a highly resource-intensive pipeline spanning pre-training, continual pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL). In this report, we show that when fueled with informative and high-difficulty trajectories, a simple SFT approach could be surprisingly powerful for training frontier search agents. By introducing three simple data synthesis modifications: scaling knowledge graph size for richer exploration, expanding the tool set size for broader functionality, and strict low-step filtering, we establish a stronger baseline. Trained on merely 10.6k data points, our OpenSeeker-v2 achieves state-of-the-art performance across 4 benchmarks (30B-sized agents with ReAct paradigm): 46.0% on BrowseComp, 58.1% on BrowseComp-ZH, 34.6% on Humanity's Last Exam, and 78.0% on xbench, surpassing even Tongyi DeepResearch trained with heavy CPT+SFT+RL pipeline, which achieves 43.4%, 46.7%, 32.9%, and 75.0%, respectively. Notably, OpenSeeker-v2 represents the first state-of-the-art search agent within its model scale and paradigm to be developed by a purely academic team using only SFT. We are excited to open-source the OpenSeeker-v2 model weights and share our simple yet effective findings to make frontier search agent research more accessible to the community.

OpenSeeker-v2: 정보성과 고난이도 궤적을 통해 검색 에이전트의 한계를 확장하다

OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories

초록

Support