AgentFold:具备主动上下文管理能力的远见网页智能体
AgentFold: Long-Horizon Web Agents with Proactive Context Management
October 28, 2025
作者: Rui Ye, Zhongwang Zhang, Kuan Li, Huifeng Yin, Zhengwei Tao, Yida Zhao, Liangcai Su, Liwen Zhang, Zile Qiao, Xinyu Wang, Pengjun Xie, Fei Huang, Siheng Chen, Jingren Zhou, Yong Jiang
cs.AI
摘要
基于大语言模型的网络智能体在信息检索领域展现出巨大潜力,但其在长周期任务中的效能受到上下文管理固有矛盾的制约。当前主流的基于ReAct框架的智能体因持续积累杂乱原始历史记录而面临语境饱和问题,而那些在每一步都固定式汇总完整历史的方法则可能造成关键细节的不可逆丢失。针对这些挑战,我们受人类回溯巩固认知过程的启发,提出了以主动式上下文管理为核心的新型智能体范式AgentFold。该范式将上下文视作可动态雕琢的认知工作空间,而非被动填充的日志记录。在每个决策步骤中,AgentFold通过习得的“折叠”操作对历史轨迹进行多尺度管理:既可执行细粒度压缩以保留重要的微观细节,也能进行深度整合来抽象化多步骤子任务。在权威基准测试中的结果令人瞩目:仅通过简单的监督微调(无需持续预训练或强化学习),我们的AgentFold-30B-A3B智能体在BrowseComp上达到36.2%的准确率,在BrowseComp-ZH上达到47.3%。尤为值得注意的是,这一表现不仅超越或匹配了规模显著更大的开源模型(如DeepSeek-V3.1-671B-A37B),甚至超越了OpenAI的o4-mini等领先的专有智能体。
English
LLM-based web agents show immense promise for information seeking, yet their
effectiveness on long-horizon tasks is hindered by a fundamental trade-off in
context management. Prevailing ReAct-based agents suffer from context
saturation as they accumulate noisy, raw histories, while methods that fixedly
summarize the full history at each step risk the irreversible loss of critical
details. Addressing these, we introduce AgentFold, a novel agent paradigm
centered on proactive context management, inspired by the human cognitive
process of retrospective consolidation. AgentFold treats its context as a
dynamic cognitive workspace to be actively sculpted, rather than a passive log
to be filled. At each step, it learns to execute a `folding' operation, which
manages its historical trajectory at multiple scales: it can perform granular
condensations to preserve vital, fine-grained details, or deep consolidations
to abstract away entire multi-step sub-tasks. The results on prominent
benchmarks are striking: with simple supervised fine-tuning (without continual
pre-training or RL), our AgentFold-30B-A3B agent achieves 36.2% on BrowseComp
and 47.3% on BrowseComp-ZH. Notably, this performance not only surpasses or
matches open-source models of a dramatically larger scale, such as the
DeepSeek-V3.1-671B-A37B, but also surpasses leading proprietary agents like
OpenAI's o4-mini.