PEEK: 긴 문맥 LLM 에이전트를 위한 방향 캐시로서의 문맥 지도

초록

대규모 언어 모델(LLM) 에이전트는 문서 코퍼스 및 코드 저장소와 같은 길고 반복되는 외부 맥락 위에서 점점 더 많이 작동하고 있다. 기존 접근 방식은 호출 간에 에이전트의 궤적, 원자료에 대한 수동적 접근, 또는 작업 수준 전략 중 하나를 보존한다. 그러나 반복되는 동일 맥락 작업 부하에 가장 필요하다고 우리가 주장하는 것, 즉 반복되는 맥락 자체에 대한 재사용 가능한 방향 지식(예: 맥락이 무엇을 포함하는지, 어떻게 구성되어 있는지, 그리고 역사적으로 유용했던 엔터티, 상수, 스키마가 무엇인지)을 보존하는 방법은 없다. 우리는 이 방향 지식을 맥락 맵(context map)으로 캐싱하고 유지하는 시스템인 PEEK를 소개한다. 맥락 맵은 에이전트의 프롬프트에 있는 작고 일정한 크기의 인공물로, 에이전트에게 외부 맥락에 대한 지속적인 엿보기를 제공한다. 맵은 추론 시간 신호에서 전이 가능한 지식을 추출하는 Distiller, 이를 구조화된 편집으로 변환하는 Cartographer, 그리고 고정된 토큰 예산을 강제하는 우선순위 기반 Evictor의 세 가지 모듈로 구성된 프로그래밍 가능한 캐시 정책에 의해 유지된다. 긴 맥락 추론 및 정보 집계 작업에서 PEEK는 강력한 기준선 대비 6.3~34.0% 향상된 성능을 보이면서도 최첨단 프롬프트 학습 프레임워크인 ACE보다 93~145회 적은 반복을 사용하고 1.7~5.8배 낮은 비용을 발생시킨다. 맥락 학습에서 PEEK는 ACE 대비 1.4배 낮은 비용으로 해결률과 루브릭 정확도를 각각 6.0~14.0% 및 7.8~12.1% 향상시킨다. 이러한 성능 향상은 프로덕션 등급 코딩 에이전트인 OpenAI Codex를 포함한 다양한 언어 모델 및 에이전트 아키텍처에서 일반화된다. 종합적으로, 이러한 결과는 맥락 맵이 긴 맥락 LLM 에이전트가 반복되는 외부 맥락과 더 정확하고 효율적으로 상호작용하도록 돕는다는 것을 보여준다.

English

Large language model (LLM) agents increasingly operate over long and recurring external contexts, like document corpora and code repositories. Across invocations, existing approaches preserve either the agent's trajectory, passive access to raw material, or task-level strategies. None of them preserves what we argue is most needed for repeated same-context workloads: reusable orientation knowledge (e.g., what the context contains, how it is organized, and which entities, constants, and schemas have historically been useful) about the recurring context itself. We introduce PEEK, a system that caches and maintains this orientation knowledge as a context map: a small, constant-sized artifact in the agent's prompt that gives it a persistent peek into the external context. The map is maintained by a programmable cache policy with three modules: a Distiller that extracts transferable knowledge from inference-time signals, a Cartographer that translates it into structured edits, and a priority-based Evictor that enforces a fixed token budget. On long-context reasoning and information aggregation, PEEK improves over strong baselines by 6.3-34.0% while using 93-145 fewer iterations and incurring 1.7-5.8x lower cost than the state-of-the-art prompt-learning framework, ACE. On context learning, PEEK improves solving rate and rubric accuracy by 6.0-14.0% and 7.8-12.1%, respectively, at 1.4x lower cost than ACE. These gains generalize across LMs and agent architectures, including OpenAI Codex, a production-grade coding agent. Together, these results show that a context map helps long-context LLM agents interact with recurring external contexts more accurately and efficiently.