InfoAgent：推動自主信息尋求代理的發展

摘要

構建能夠通過與外部工具互動來擴展其能力的大型語言模型代理，代表了人工智慧研究和應用的新前沿。在本文中，我們介紹了InfoAgent，這是一個由創新的數據合成管道和協調的網絡搜索工具驅動的深度研究代理。為了構建具有挑戰性且難以找到的查詢，我們建立了實體樹並應用子樹採樣與實體模糊化，以系統性地增加問題的難度。與之前依賴商業搜索工具的工作不同，我們開發了專用的自託管搜索基礎設施，增強了代理環境的透明度，並促進了代理能力的進一步提升。我們通過測量正確回答問題所需的平均工具調用次數來評估數據管道的有效性，並展示了我們的代理在配備我們的工具時表現更佳。我們的InfoAgent是從Qwen3-14B進行後訓練的，採用兩階段配方：冷啟動監督微調以灌輸長遠搜索行為，隨後進行強化學習，顯著提高了推理驅動的工具使用。通過我們的方法，InfoAgent在BrowseComp上達到了15.3%的準確率，在BrowseComp-ZH上達到了29.2%，在Xbench-DS上達到了40.4%，超越了之前的開源深度研究代理，如WebSailor-72B和DeepDive-32B。

English

Building Large Language Model agents that expand their capabilities by interacting with external tools represents a new frontier in AI research and applications. In this paper, we introduce InfoAgent, a deep research agent powered by an innovative data synthesis pipeline and orchestrated web search tools. To construct challenging, hard-to-find queries,we build entity trees and apply sub-tree sampling with entity fuzzification to systematically increase question difficulty. Unlike prior work that relies heavily on commercial search tools, we develop a dedicated self-hosted search infrastructure, enhancing transparency of agent environments and facilitating further advancement of agent capacity. We evaluate the effectiveness of our data pipeline by measuring the average number of tool calls required to correctly answer a question, and also show that our agent yields better performance when equipped with our tools. Our InfoAgent is post-trained from Qwen3-14B using a two-stage recipe: cold-start supervised finetuning to instill long-horizon search behaviors, followed by reinforcement learning which significantly improves reasoning-driven tool use. With our methods, InfoAgent achieves 15.3\% accuracy on BrowseComp, 29.2\% on BrowseComp-ZH, and 40.4\% on Xbench-DS, outperforming prior open-source deep research agents such as WebSailor-72B and DeepDive-32B.

InfoAgent：推動自主信息尋求代理的發展

InfoAgent: Advancing Autonomous Information-Seeking Agents

摘要

Support