ChatPaper.aiChatPaper

代理式上下文工程:為自我改進語言模型演進上下文

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

October 6, 2025
作者: Qizheng Zhang, Changran Hu, Shubhangi Upasani, Boyuan Ma, Fenglu Hong, Vamsidhar Kamanuru, Jay Rainton, Chen Wu, Mengmeng Ji, Hanchen Li, Urmish Thakker, James Zou, Kunle Olukotun
cs.AI

摘要

大型语言模型(LLM)应用,如智能体和领域特定推理,日益依赖于上下文适应——通过指令、策略或证据修改输入,而非权重更新。先前的方法提升了可用性,但常受简洁性偏差影响,即为了简洁摘要而舍弃领域洞察,以及上下文崩塌问题,即迭代重写随时间侵蚀细节。基于动态速查表引入的自适应记忆,我们提出了ACE(智能体上下文工程),该框架将上下文视为不断演变的剧本,通过生成、反思和整理的模块化过程积累、精炼和组织策略。ACE通过结构化、增量式更新防止崩塌,保留详细知识并适应长上下文模型。在智能体和领域特定基准测试中,ACE优化了离线(如系统提示)和在线(如智能体记忆)上下文,始终优于强基线:智能体任务提升10.6%,金融领域提升8.6%,同时显著减少适应延迟和部署成本。值得注意的是,ACE无需标注监督即可有效适应,而是利用自然执行反馈。在AppWorld排行榜上,ACE在整体平均分上与顶级生产级智能体持平,并在更具挑战性的测试挑战部分超越之,尽管使用的是较小的开源模型。这些结果表明,全面、不断演变的上下文能够实现可扩展、高效且自我改进的LLM系统,且开销低。
English
Large language model (LLM) applications such as agents and domain-specific reasoning increasingly rely on context adaptation -- modifying inputs with instructions, strategies, or evidence, rather than weight updates. Prior approaches improve usability but often suffer from brevity bias, which drops domain insights for concise summaries, and from context collapse, where iterative rewriting erodes details over time. Building on the adaptive memory introduced by Dynamic Cheatsheet, we introduce ACE (Agentic Context Engineering), a framework that treats contexts as evolving playbooks that accumulate, refine, and organize strategies through a modular process of generation, reflection, and curation. ACE prevents collapse with structured, incremental updates that preserve detailed knowledge and scale with long-context models. Across agent and domain-specific benchmarks, ACE optimizes contexts both offline (e.g., system prompts) and online (e.g., agent memory), consistently outperforming strong baselines: +10.6% on agents and +8.6% on finance, while significantly reducing adaptation latency and rollout cost. Notably, ACE could adapt effectively without labeled supervision and instead by leveraging natural execution feedback. On the AppWorld leaderboard, ACE matches the top-ranked production-level agent on the overall average and surpasses it on the harder test-challenge split, despite using a smaller open-source model. These results show that comprehensive, evolving contexts enable scalable, efficient, and self-improving LLM systems with low overhead.
PDF553October 7, 2025