WebExplorer:探索與進化——訓練長時序網路代理的框架
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents
September 8, 2025
作者: Junteng Liu, Yunji Li, Chi Zhang, Jingyang Li, Aili Chen, Ke Ji, Weiyu Cheng, Zijia Wu, Chengyu Du, Qidi Xu, Jiayuan Song, Zhengmao Zhu, Wenhu Chen, Pengyu Zhao, Junxian He
cs.AI
摘要
大型語言模型(LLMs)的範式已逐漸轉向代理應用,其中網路瀏覽能力對於從多樣化的線上資源中檢索資訊至關重要。然而,現有的開源網路代理在複雜任務上展現出有限的信息搜尋能力,或缺乏透明的實現方式。在本研究中,我們發現關鍵挑戰在於缺乏具有挑戰性的信息搜尋數據。為解決這一限制,我們引入了WebExplorer:一種基於模型探索和迭代式長短查詢演進的系統化數據生成方法。該方法創造了需要多步推理和複雜網路導航的挑戰性查詢-答案對。通過利用我們精心策劃的高質量數據集,我們成功開發了高級網路代理WebExplorer-8B,該模型通過監督微調後進行強化學習訓練。我們的模型支持128K的上下文長度和最多100次工具調用,實現了長時程問題解決。在各種信息搜尋基準測試中,WebExplorer-8B在其規模上達到了最先進的性能。值得注意的是,作為一個8B大小的模型,WebExplorer-8B在強化學習訓練後能夠有效進行平均16次搜索,在BrowseComp-en/zh上比WebSailor-72B獲得更高的準確率,並在WebWalkerQA和FRAMES上達到100B參數以下模型的最佳性能。除了這些信息搜尋任務外,我們的模型在HLE基準測試上也展現出強大的泛化能力,儘管它僅在知識密集型QA數據上進行了訓練。這些結果凸顯了我們的方法作為實現長時程網路代理的實用途徑。
English
The paradigm of Large Language Models (LLMs) has increasingly shifted toward
agentic applications, where web browsing capabilities are fundamental for
retrieving information from diverse online sources. However, existing
open-source web agents either demonstrate limited information-seeking abilities
on complex tasks or lack transparent implementations. In this work, we identify
that the key challenge lies in the scarcity of challenging data for information
seeking. To address this limitation, we introduce WebExplorer: a systematic
data generation approach using model-based exploration and iterative,
long-to-short query evolution. This method creates challenging query-answer
pairs that require multi-step reasoning and complex web navigation. By
leveraging our curated high-quality dataset, we successfully develop advanced
web agent WebExplorer-8B through supervised fine-tuning followed by
reinforcement learning. Our model supports 128K context length and up to 100
tool calling turns, enabling long-horizon problem solving. Across diverse
information-seeking benchmarks, WebExplorer-8B achieves the state-of-the-art
performance at its scale. Notably, as an 8B-sized model, WebExplorer-8B is able
to effectively search over an average of 16 turns after RL training, achieving
higher accuracy than WebSailor-72B on BrowseComp-en/zh and attaining the best
performance among models up to 100B parameters on WebWalkerQA and FRAMES.
Beyond these information-seeking tasks, our model also achieves strong
generalization on the HLE benchmark even though it is only trained on
knowledge-intensive QA data. These results highlight our approach as a
practical path toward long-horizon web agents.