通过上下文折叠实现长程LLM智能体的扩展
Scaling Long-Horizon LLM Agent via Context-Folding
October 13, 2025
作者: Weiwei Sun, Miao Lu, Zhan Ling, Kang Liu, Xuesong Yao, Yiming Yang, Jiecao Chen
cs.AI
摘要
大型語言模型(LLM)代理在處理長期任務時,其效能根本上受制於上下文長度的限制。本文提出了一種名為“上下文折疊”(Context-Folding)的框架,該框架賦予代理主動管理其工作上下文的能力。代理能夠程序性地分支進入子軌跡以處理子任務,並在完成後將其折疊,從而壓縮中間步驟,同時保留結果的簡明摘要。為了使此行為可學習,我們開發了一種端到端的強化學習框架FoldGRPO,該框架通過特定的過程獎勵來鼓勵有效的任務分解與上下文管理。在複雜的長期任務(如深度研究與軟體工程)上,我們的折疊代理在保持活躍上下文僅為十分之一大小的情況下,與ReAct基準線持平或更優,並且顯著優於依賴於基於摘要的上下文管理模型。
English
Large language model (LLM) agents are fundamentally constrained by context
length on long-horizon tasks. We introduce Context-Folding, a framework that
empowers agents to actively manage their working context. An agent can
procedurally branch into a sub-trajectory to handle a subtask and then fold it
upon completion, collapsing the intermediate steps while retaining a concise
summary of the outcome. To make this behavior learnable, we develop an
end-to-end reinforcement learning framework FoldGRPO with specific process
rewards to encourage effective task decomposition and context management. On
complex long-horizon tasks (Deep Research and SWE), our folding agent matches
or outperforms the ReAct baselines while using an active context 10times
smaller and significantly outperforms models that rely on summarization-based
context management.