WebLeaper:通过赋能信息丰富的寻径功能提升网络智能体的效率与效能
WebLeaper: Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking
October 28, 2025
作者: Zhengwei Tao, Haiyang Shen, Baixuan Li, Wenbiao Yin, Jialong Wu, Kuan Li, Zhongwang Zhang, Huifeng Yin, Rui Ye, Liwen Zhang, Xinyu Wang, Pengjun Xie, Jingren Zhou, Yong Jiang
cs.AI
摘要
基于大语言模型(LLM)的智能体已成为解决开放性问题的重要突破,其中信息检索(IS)作为实现自主推理与决策的核心能力尤为关键。尽管现有研究主要聚焦于提升检索深度,我们发现当前IS智能体普遍存在搜索效率低下的问题,进而制约整体性能。这种低效现象的核心成因在于训练任务中目标实体的稀疏性,限制了智能体学习并泛化高效搜索行为的机会。为应对这些挑战,我们提出WebLeaper框架——通过构建高覆盖度的IS任务并生成高效解决轨迹的系统方案。我们将IS问题形式化为树状推理结构,使大量目标实体能在有限上下文中实现嵌入。借助精心筛选的维基百科表格,我们设计了基础型、联合型及逆向联合型三种任务生成变体,系统化提升IS的效能与效率。最后通过仅保留同时具备准确性与高效性的训练轨迹,确保模型在正确性与搜索性能上获得双重优化。在五大IS基准测试(BrowserComp、GAIA、xbench-DeepSearch、WideSearch和Seal-0)上进行的广泛实验表明,无论是基础场景还是综合场景,我们的方法在效果与效率方面均持续超越现有强基线模型。
English
Large Language Model (LLM)-based agents have emerged as a transformative
approach for open-ended problem solving, with information seeking (IS) being a
core capability that enables autonomous reasoning and decision-making. While
prior research has largely focused on improving retrieval depth, we observe
that current IS agents often suffer from low search efficiency, which in turn
constrains overall performance. A key factor underlying this inefficiency is
the sparsity of target entities in training tasks, which limits opportunities
for agents to learn and generalize efficient search behaviors. To address these
challenges, we propose WebLeaper, a framework for constructing high-coverage IS
tasks and generating efficient solution trajectories. We formulate IS as a
tree-structured reasoning problem, enabling a substantially larger set of
target entities to be embedded within a constrained context. Leveraging curated
Wikipedia tables, we propose three variants for synthesizing IS tasks, Basic,
Union, and Reverse-Union, to systematically increase both IS efficiency and
efficacy. Finally, we curate training trajectories by retaining only those that
are simultaneously accurate and efficient, ensuring that the model is optimized
for both correctness and search performance. Extensive experiments on both
basic and comprehensive settings, conducted on five IS benchmarks, BrowserComp,
GAIA, xbench-DeepSearch, WideSearch, and Seal-0, demonstrate that our method
consistently achieves improvements in both effectiveness and efficiency over
strong baselines.