PEEK: 上下文地图作为长上下文LLM代理的定向缓存

摘要

大型语言模型（LLM）代理日益在长期且重复的外部上下文（如文档语料库和代码仓库）中运行。在多次调用过程中，现有方法要么保留代理的运行轨迹、对原始材料的被动访问权限，要么保留任务级别的策略。但它们均未保留我们认为对于重复相同上下文工作负载最急需的内容：关于重复上下文本身的可重用导向知识（例如，上下文包含什么、如何组织、以及历史上哪些实体、常量、模式被证明有用）。我们提出PEEK系统，该系统将这种导向知识缓存并维护为一个上下文地图：一个位于代理提示中、大小恒定的小型构件，使其能够持续窥探外部上下文。该地图由可编程缓存策略维护，包含三个模块：一个提炼器，从推理时信号中提取可迁移知识；一个制图师，将其转化为结构化编辑；以及一个基于优先级的驱逐器，强制执行固定的token预算。在长上下文推理和信息聚合任务中，PEEK相比强基线提升了6.3%-34.0%，同时减少了93-145次迭代，相比最先进的提示学习框架ACE降低了1.7-5.8倍成本。在上下文学习任务中，PEEK的解决率和评分准确率分别提升了6.0%-14.0%和7.8%-12.1%，成本仅为ACE的1.4倍。这些增益在不同语言模型和代理架构（包括生产级编码代理OpenAI Codex）中普遍成立。综合来看，这些结果表明，上下文地图有助于长上下文LLM代理更准确、高效地与重复的外部上下文进行交互。

English

Large language model (LLM) agents increasingly operate over long and recurring external contexts, like document corpora and code repositories. Across invocations, existing approaches preserve either the agent's trajectory, passive access to raw material, or task-level strategies. None of them preserves what we argue is most needed for repeated same-context workloads: reusable orientation knowledge (e.g., what the context contains, how it is organized, and which entities, constants, and schemas have historically been useful) about the recurring context itself. We introduce PEEK, a system that caches and maintains this orientation knowledge as a context map: a small, constant-sized artifact in the agent's prompt that gives it a persistent peek into the external context. The map is maintained by a programmable cache policy with three modules: a Distiller that extracts transferable knowledge from inference-time signals, a Cartographer that translates it into structured edits, and a priority-based Evictor that enforces a fixed token budget. On long-context reasoning and information aggregation, PEEK improves over strong baselines by 6.3-34.0% while using 93-145 fewer iterations and incurring 1.7-5.8x lower cost than the state-of-the-art prompt-learning framework, ACE. On context learning, PEEK improves solving rate and rubric accuracy by 6.0-14.0% and 7.8-12.1%, respectively, at 1.4x lower cost than ACE. These gains generalize across LMs and agent architectures, including OpenAI Codex, a production-grade coding agent. Together, these results show that a context map helps long-context LLM agents interact with recurring external contexts more accurately and efficiently.