PEEK：上下文映射作為長上下文LLM智能體的方向快取

摘要

大型語言模型（LLM）代理越來越多地處理長時間且重複發生的外部上下文，例如文件語料庫與程式碼儲存庫。在多次調用之間，現有方法保留了代理的軌跡、被動存取原始素材的能力，或任務層級的策略。然而，這些方法都未保留對重複性相同上下文工作負載來說最為關鍵的要素：關於重複上下文本身的可重複使用定向知識（例如，上下文包含哪些內容、如何組織，以及歷史上哪些實體、常數和模式曾被證明有用）。我們提出 PEEK 系統，該系統將此類定向知識快取並維護為一個上下文地圖：一種小型、大小固定的工件，嵌入代理的提示中，使其能持續窺見外部上下文。該地圖由一個可程式化的快取策略維護，包含三個模組：蒸餾器（Distiller），從推理時訊號中提取可遷移的知識；製圖器（Cartographer），將這些知識轉換為結構化編輯；以及基於優先級的驅逐器（Evictor），強制執行固定的令牌預算。在長上下文推理與資訊聚合任務上，PEEK 相較於強基線提升了 6.3%–34.0%，同時減少了 93–145 次迭代，且成本比最先進的提示學習框架 ACE 低 1.7–5.8 倍。在上下文學習方面，PEEK 的解題率與評分準確率分別提升 6.0%–14.0% 與 7.8%–12.1%，成本僅為 ACE 的 1.4 倍。這些增益普遍適用於多種語言模型與代理架構，包括生產級編碼代理 OpenAI Codex。綜合來看，這些結果顯示，上下文地圖能協助長上下文 LLM 代理更準確且高效地與重複發生的外部上下文互動。

English

Large language model (LLM) agents increasingly operate over long and recurring external contexts, like document corpora and code repositories. Across invocations, existing approaches preserve either the agent's trajectory, passive access to raw material, or task-level strategies. None of them preserves what we argue is most needed for repeated same-context workloads: reusable orientation knowledge (e.g., what the context contains, how it is organized, and which entities, constants, and schemas have historically been useful) about the recurring context itself. We introduce PEEK, a system that caches and maintains this orientation knowledge as a context map: a small, constant-sized artifact in the agent's prompt that gives it a persistent peek into the external context. The map is maintained by a programmable cache policy with three modules: a Distiller that extracts transferable knowledge from inference-time signals, a Cartographer that translates it into structured edits, and a priority-based Evictor that enforces a fixed token budget. On long-context reasoning and information aggregation, PEEK improves over strong baselines by 6.3-34.0% while using 93-145 fewer iterations and incurring 1.7-5.8x lower cost than the state-of-the-art prompt-learning framework, ACE. On context learning, PEEK improves solving rate and rubric accuracy by 6.0-14.0% and 7.8-12.1%, respectively, at 1.4x lower cost than ACE. These gains generalize across LMs and agent architectures, including OpenAI Codex, a production-grade coding agent. Together, these results show that a context map helps long-context LLM agents interact with recurring external contexts more accurately and efficiently.