InfoAgent:推動自主信息尋求代理的發展
InfoAgent: Advancing Autonomous Information-Seeking Agents
September 29, 2025
作者: Gongrui Zhang, Jialiang Zhu, Ruiqi Yang, Kai Qiu, Miaosen Zhang, Zhirong Wu, Qi Dai, Bei Liu, Chong Luo, Zhengyuan Yang, Linjie Li, Lijuan Wang, Weizhu Chen, Yuan Zhang, Xin Li, Zhaoyi Liu, Xin Geng, Baining Guo
cs.AI
摘要
構建能夠通過與外部工具互動來擴展其能力的大型語言模型代理,代表了人工智慧研究和應用的新前沿。在本文中,我們介紹了InfoAgent,這是一個由創新的數據合成管道和協調的網絡搜索工具驅動的深度研究代理。為了構建具有挑戰性且難以找到的查詢,我們建立了實體樹並應用子樹採樣與實體模糊化,以系統性地增加問題的難度。與之前依賴商業搜索工具的工作不同,我們開發了專用的自託管搜索基礎設施,增強了代理環境的透明度,並促進了代理能力的進一步提升。我們通過測量正確回答問題所需的平均工具調用次數來評估數據管道的有效性,並展示了我們的代理在配備我們的工具時表現更佳。我們的InfoAgent是從Qwen3-14B進行後訓練的,採用兩階段配方:冷啟動監督微調以灌輸長遠搜索行為,隨後進行強化學習,顯著提高了推理驅動的工具使用。通過我們的方法,InfoAgent在BrowseComp上達到了15.3%的準確率,在BrowseComp-ZH上達到了29.2%,在Xbench-DS上達到了40.4%,超越了之前的開源深度研究代理,如WebSailor-72B和DeepDive-32B。
English
Building Large Language Model agents that expand their capabilities by
interacting with external tools represents a new frontier in AI research and
applications. In this paper, we introduce InfoAgent, a deep research agent
powered by an innovative data synthesis pipeline and orchestrated web search
tools. To construct challenging, hard-to-find queries,we build entity trees and
apply sub-tree sampling with entity fuzzification to systematically increase
question difficulty. Unlike prior work that relies heavily on commercial search
tools, we develop a dedicated self-hosted search infrastructure, enhancing
transparency of agent environments and facilitating further advancement of
agent capacity. We evaluate the effectiveness of our data pipeline by measuring
the average number of tool calls required to correctly answer a question, and
also show that our agent yields better performance when equipped with our
tools. Our InfoAgent is post-trained from Qwen3-14B using a two-stage
recipe: cold-start supervised finetuning to instill long-horizon search
behaviors, followed by reinforcement learning which significantly improves
reasoning-driven tool use. With our methods, InfoAgent achieves 15.3\% accuracy
on BrowseComp, 29.2\% on BrowseComp-ZH, and 40.4\% on Xbench-DS, outperforming
prior open-source deep research agents such as WebSailor-72B and DeepDive-32B.