ChatPaper.aiChatPaper

代理式上下文工程:为自改进语言模型演进上下文环境

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

October 6, 2025
作者: Qizheng Zhang, Changran Hu, Shubhangi Upasani, Boyuan Ma, Fenglu Hong, Vamsidhar Kamanuru, Jay Rainton, Chen Wu, Mengmeng Ji, Hanchen Li, Urmish Thakker, James Zou, Kunle Olukotun
cs.AI

摘要

诸如智能体和领域特定推理等大型语言模型(LLM)应用,日益依赖于上下文适应——通过指令、策略或证据调整输入,而非权重更新。先前的方法虽提升了可用性,却常受限于简洁性偏差,即为了简明扼要而舍弃领域洞察,以及上下文坍缩,即迭代重写过程中细节逐渐流失。基于动态速查表引入的自适应记忆,我们提出了ACE(Agentic Context Engineering,智能体上下文工程)框架,该框架将上下文视为不断演进的策略手册,通过生成、反思与整理的模块化过程,积累、精炼并组织策略。ACE采用结构化、增量式的更新方式,防止了上下文坍缩,保留了详细知识,并与长上下文模型协同扩展。在智能体及领域特定基准测试中,ACE优化了离线(如系统提示)和在线(如智能体记忆)上下文,持续超越强劲基线:智能体任务提升10.6%,金融领域提升8.6%,同时显著降低了适应延迟和部署成本。值得注意的是,ACE无需标注监督,而是通过利用自然执行反馈,即可有效适应。在AppWorld排行榜上,ACE在整体平均分上与顶级生产级智能体持平,并在更具挑战性的测试挑战部分超越之,尽管使用的是较小的开源模型。这些结果表明,全面且不断演进的上下文,能够以低开销实现可扩展、高效且自我提升的LLM系统。
English
Large language model (LLM) applications such as agents and domain-specific reasoning increasingly rely on context adaptation -- modifying inputs with instructions, strategies, or evidence, rather than weight updates. Prior approaches improve usability but often suffer from brevity bias, which drops domain insights for concise summaries, and from context collapse, where iterative rewriting erodes details over time. Building on the adaptive memory introduced by Dynamic Cheatsheet, we introduce ACE (Agentic Context Engineering), a framework that treats contexts as evolving playbooks that accumulate, refine, and organize strategies through a modular process of generation, reflection, and curation. ACE prevents collapse with structured, incremental updates that preserve detailed knowledge and scale with long-context models. Across agent and domain-specific benchmarks, ACE optimizes contexts both offline (e.g., system prompts) and online (e.g., agent memory), consistently outperforming strong baselines: +10.6% on agents and +8.6% on finance, while significantly reducing adaptation latency and rollout cost. Notably, ACE could adapt effectively without labeled supervision and instead by leveraging natural execution feedback. On the AppWorld leaderboard, ACE matches the top-ranked production-level agent on the overall average and surpasses it on the harder test-challenge split, despite using a smaller open-source model. These results show that comprehensive, evolving contexts enable scalable, efficient, and self-improving LLM systems with low overhead.
PDF553October 7, 2025