AgentFold:具備前瞻性上下文管理能力的長視野網路代理系統
AgentFold: Long-Horizon Web Agents with Proactive Context Management
October 28, 2025
作者: Rui Ye, Zhongwang Zhang, Kuan Li, Huifeng Yin, Zhengwei Tao, Yida Zhao, Liangcai Su, Liwen Zhang, Zile Qiao, Xinyu Wang, Pengjun Xie, Fei Huang, Siheng Chen, Jingren Zhou, Yong Jiang
cs.AI
摘要
基於大型語言模型的網路代理在資訊搜尋領域展現巨大潛力,但其在長時程任務中的效能受到上下文管理根本性權衡的限制。主流基於ReAct框架的代理會因累積雜亂的原始歷史記錄而遭遇上下文飽和問題,而那些在每一步固定彙整完整歷史的方法則可能導致關鍵細節不可逆的遺失。為解決這些問題,我們受人類回顧性鞏固的認知過程啟發,提出AgentFold這一以主動式上下文管理為核心的新型代理範式。AgentFold將其上下文視作可主動塑形的動態認知工作區,而非被動填充的日誌。在每個步驟中,它通過學習執行「摺疊」操作來實現多尺度歷史軌跡管理:既可進行細粒度濃縮以保存重要的微觀細節,也能執行深度整合來抽象化整個多步驟子任務。在知名基準測試中的結果令人矚目:僅通過簡單的監督微調(無需持續預訓練或強化學習),我們的AgentFold-30B-A3B代理在BrowseComp上達到36.2%,在BrowseComp-ZH上達到47.3%。尤為突出的是,此效能不僅超越或匹敵規模大得多的開源模型(如DeepSeek-V3.1-671B-A37B),更勝過領先的專有代理如OpenAI的o4-mini。
English
LLM-based web agents show immense promise for information seeking, yet their
effectiveness on long-horizon tasks is hindered by a fundamental trade-off in
context management. Prevailing ReAct-based agents suffer from context
saturation as they accumulate noisy, raw histories, while methods that fixedly
summarize the full history at each step risk the irreversible loss of critical
details. Addressing these, we introduce AgentFold, a novel agent paradigm
centered on proactive context management, inspired by the human cognitive
process of retrospective consolidation. AgentFold treats its context as a
dynamic cognitive workspace to be actively sculpted, rather than a passive log
to be filled. At each step, it learns to execute a `folding' operation, which
manages its historical trajectory at multiple scales: it can perform granular
condensations to preserve vital, fine-grained details, or deep consolidations
to abstract away entire multi-step sub-tasks. The results on prominent
benchmarks are striking: with simple supervised fine-tuning (without continual
pre-training or RL), our AgentFold-30B-A3B agent achieves 36.2% on BrowseComp
and 47.3% on BrowseComp-ZH. Notably, this performance not only surpasses or
matches open-source models of a dramatically larger scale, such as the
DeepSeek-V3.1-671B-A37B, but also surpasses leading proprietary agents like
OpenAI's o4-mini.