PEEK: 長文脈LLMエージェントのための方向付けキャッシュとしてのコンテキストマップ

要旨

大規模言語モデル（LLM）エージェントは、文書コーパスやコードリポジトリのような、長期にわたり繰り返し発生する外部コンテキスト上で動作することが増えている。呼び出しのたびに、既存のアプローチはエージェントの軌跡、生の素材への受動的なアクセス、またはタスクレベルの戦略のいずれかを保持する。しかし、繰り返し同じコンテキストで動作するワークロードにおいて最も必要と考えられる、再利用可能な方向付け知識（例えば、コンテキストに何が含まれているか、その構成方法、歴史的に有用であったエンティティ、定数、スキーマなど）を、繰り返し発生するコンテキスト自体について保持するものはない。本稿では、この方向付け知識をコンテキストマップとしてキャッシュし維持するシステムPEEKを紹介する。コンテキストマップとは、エージェントのプロンプト内に存在する、小さく一定サイズのアーティファクトであり、エージェントに外部コンテキストへの持続的な垣間見を提供する。このマップは、3つのモジュールからなるプログラム可能なキャッシュポリシーによって維持される。すなわち、推論時のシグナルから転移可能な知識を抽出する蒸留器、それを構造化された編集に変換する地図製作者、そして固定されたトークン予算を強制する優先度ベースの削除器である。長期コンテキストの推論と情報集約において、PEEKは強力なベースラインを6.3～34.0%上回り、同時に93～145回少ない反復回数で、最先端のプロンプト学習フレームワークACEと比較して1.7～5.8倍低いコストを実現する。コンテキスト学習においては、PEEKは解決率とルーブリック精度をそれぞれ6.0～14.0%および7.8～12.1%向上させ、ACE比1.4倍の低コストを達成する。これらの利点は、OpenAI Codex（本番環境向けコーディングエージェント）を含む、様々な言語モデルおよびエージェントアーキテクチャに一般化される。これらの結果は、コンテキストマップが長期コンテキストのLLMエージェントが繰り返し発生する外部コンテキストとより正確かつ効率的に相互作用するのに役立つことを示している。

English

Large language model (LLM) agents increasingly operate over long and recurring external contexts, like document corpora and code repositories. Across invocations, existing approaches preserve either the agent's trajectory, passive access to raw material, or task-level strategies. None of them preserves what we argue is most needed for repeated same-context workloads: reusable orientation knowledge (e.g., what the context contains, how it is organized, and which entities, constants, and schemas have historically been useful) about the recurring context itself. We introduce PEEK, a system that caches and maintains this orientation knowledge as a context map: a small, constant-sized artifact in the agent's prompt that gives it a persistent peek into the external context. The map is maintained by a programmable cache policy with three modules: a Distiller that extracts transferable knowledge from inference-time signals, a Cartographer that translates it into structured edits, and a priority-based Evictor that enforces a fixed token budget. On long-context reasoning and information aggregation, PEEK improves over strong baselines by 6.3-34.0% while using 93-145 fewer iterations and incurring 1.7-5.8x lower cost than the state-of-the-art prompt-learning framework, ACE. On context learning, PEEK improves solving rate and rubric accuracy by 6.0-14.0% and 7.8-12.1%, respectively, at 1.4x lower cost than ACE. These gains generalize across LMs and agent architectures, including OpenAI Codex, a production-grade coding agent. Together, these results show that a context map helps long-context LLM agents interact with recurring external contexts more accurately and efficiently.