ReSum：通过上下文摘要解锁长程搜索智能

摘要

基于大型语言模型（LLM）的网页代理在知识密集型任务上展现出强劲性能，但在ReAct等范式下受限于上下文窗口的约束。涉及多实体、复杂关系及高度不确定性的复杂查询，需要大量搜索周期，往往在获得完整解决方案前就迅速耗尽上下文预算。为应对这一挑战，我们引入了ReSum，一种通过定期上下文摘要实现无限探索的新范式。ReSum将不断增长的交互历史转化为紧凑的推理状态，既保持对先前发现的认知，又绕过了上下文限制。为适应这一范式，我们提出了ReSum-GRPO，它结合了GRPO与分段轨迹训练及优势广播，使代理熟悉基于摘要的推理。在三个基准测试中对不同规模的网页代理进行广泛实验，结果表明，ReSum相比ReAct平均绝对提升了4.5%，经过ReSum-GRPO训练后，提升幅度进一步达到8.2%。值得注意的是，仅使用1K训练样本，我们的WebResummer-30B（WebSailor-30B的ReSum-GRPO训练版本）在BrowseComp-zh上实现了33.3%的Pass@1，在BrowseComp-en上达到18.3%，超越了现有的开源网页代理。

English

Large Language Model (LLM)-based web agents demonstrate strong performance on knowledge-intensive tasks but are hindered by context window limitations in paradigms like ReAct. Complex queries involving multiple entities, intertwined relationships, and high uncertainty demand extensive search cycles that rapidly exhaust context budgets before reaching complete solutions. To overcome this challenge, we introduce ReSum, a novel paradigm that enables indefinite exploration through periodic context summarization. ReSum converts growing interaction histories into compact reasoning states, maintaining awareness of prior discoveries while bypassing context constraints. For paradigm adaptation, we propose ReSum-GRPO, integrating GRPO with segmented trajectory training and advantage broadcasting to familiarize agents with summary-conditioned reasoning. Extensive experiments on web agents of varying scales across three benchmarks demonstrate that ReSum delivers an average absolute improvement of 4.5\% over ReAct, with further gains of up to 8.2\% following ReSum-GRPO training. Notably, with only 1K training samples, our WebResummer-30B (a ReSum-GRPO-trained version of WebSailor-30B) achieves 33.3\% Pass@1 on BrowseComp-zh and 18.3\% on BrowseComp-en, surpassing existing open-source web agents.

ReSum：通过上下文摘要解锁长程搜索智能

ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization

摘要

Support