OpenSeeker-v2:透過具備高資訊量與高難度軌跡的搜尋代理突破效能邊界
OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
May 5, 2026
作者: Yuwen Du, Rui Ye, Shuo Tang, Keduan Huang, Xinyu Zhu, Yuzhu Cai, Siheng Chen
cs.AI
摘要
深度搜尋能力已成為前沿大型語言模型(LLM)智能代理不可或缺的核心能力,但其發展仍由工業巨頭主導。業界典型方案依賴高度資源密集的流程,涵蓋預訓練、持續預訓練(CPT)、監督微調(SFT)與強化學習(RL)。本報告證實,當注入具信息量與高難度的軌跡數據時,僅採用簡單的SFT方法即可驚人有效地訓練前沿搜尋代理。通過三項數據合成改進:擴展知識圖譜規模以豐富探索路徑、增加工具集規模以拓展功能範圍、實施嚴格的低步數過濾,我們建立了更強的基準模型。僅使用10.6k數據點訓練的OpenSeeker-v2,在四大基準測試(基於ReAct範式的30B規模代理)中實現領先性能:BrowseComp達46.0%、BrowseComp-ZH達58.1%、Humanity's Last Exam達34.6%、xbench達78.0%,甚至超越採用繁重CPT+SFT+RL流程訓練的Tongyi DeepResearch(其成績分別為43.4%、46.7%、32.9%和75.0%)。值得注意的是,OpenSeeker-v2是首個在其模型規模與範式下、由純學術團隊僅通過SFT實現頂尖性能的搜尋代理。我們將開源OpenSeeker-v2模型權重,並分享這一簡潔而有效的發現,以期推動前沿搜尋代理研究在學界的普及。
English
Deep search capabilities have become an indispensable competency for frontier Large Language Model (LLM) agents, yet their development remains dominated by industrial giants. The typical industry recipe involves a highly resource-intensive pipeline spanning pre-training, continual pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL). In this report, we show that when fueled with informative and high-difficulty trajectories, a simple SFT approach could be surprisingly powerful for training frontier search agents. By introducing three simple data synthesis modifications: scaling knowledge graph size for richer exploration, expanding the tool set size for broader functionality, and strict low-step filtering, we establish a stronger baseline. Trained on merely 10.6k data points, our OpenSeeker-v2 achieves state-of-the-art performance across 4 benchmarks (30B-sized agents with ReAct paradigm): 46.0% on BrowseComp, 58.1% on BrowseComp-ZH, 34.6% on Humanity's Last Exam, and 78.0% on xbench, surpassing even Tongyi DeepResearch trained with heavy CPT+SFT+RL pipeline, which achieves 43.4%, 46.7%, 32.9%, and 75.0%, respectively. Notably, OpenSeeker-v2 represents the first state-of-the-art search agent within its model scale and paradigm to be developed by a purely academic team using only SFT. We are excited to open-source the OpenSeeker-v2 model weights and share our simple yet effective findings to make frontier search agent research more accessible to the community.