에이전트적 컨텍스트 엔지니어링: 자기 개선 언어 모델을 위한 컨텍스트 진화

초록

에이전트 및 도메인 특화 추론과 같은 대형 언어 모델(LLM) 애플리케이션은 점점 더 가중치 업데이트 대신 지침, 전략 또는 증거를 통해 입력을 수정하는 컨텍스트 적응에 의존하고 있습니다. 기존 접근 방식은 사용성을 개선했지만 종종 간결성 편향(brevity bias)으로 인해 도메인 통찰력을 간략한 요약으로 축소하거나, 반복적인 재작성으로 인해 시간이 지남에 따라 세부 사항이 소실되는 컨텍스트 붕괴(context collapse) 문제를 겪었습니다. Dynamic Cheatsheet에서 소개된 적응형 메모리를 기반으로, 우리는 컨텍스트를 생성, 반영, 정리라는 모듈식 프로세스를 통해 전략을 축적, 개선, 조직화하는 진화하는 플레이북으로 취급하는 ACE(Agentic Context Engineering) 프레임워크를 제안합니다. ACE는 구조화된 점진적 업데이트를 통해 붕괴를 방지하고, 세부 지식을 보존하며, 장기 컨텍스트 모델과 함께 확장 가능합니다. 에이전트 및 도메인 특화 벤치마크에서 ACE는 오프라인(예: 시스템 프롬프트) 및 온라인(예: 에이전트 메모리) 컨텍스트를 최적화하며, 강력한 베이스라인을 일관되게 능가합니다: 에이전트에서 +10.6%, 금융에서 +8.6%의 성능 향상을 보였으며, 적응 지연 시간과 롤아웃 비용을 크게 줄였습니다. 특히, ACE는 레이블된 감독 없이 자연스러운 실행 피드백을 활용하여 효과적으로 적응할 수 있었습니다. AppWorld 리더보드에서 ACE는 전체 평균에서 최고 수준의 프로덕션급 에이전트와 동등한 성능을 보였으며, 더 어려운 테스트-챌린지 분할에서는 이를 능가했는데, 이는 더 작은 오픈소스 모델을 사용했음에도 불구하고 가능했습니다. 이러한 결과는 포괄적이고 진화하는 컨텍스트가 낮은 오버헤드로 확장 가능하고 효율적이며 자기 개선이 가능한 LLM 시스템을 가능하게 함을 보여줍니다.

English

Large language model (LLM) applications such as agents and domain-specific reasoning increasingly rely on context adaptation -- modifying inputs with instructions, strategies, or evidence, rather than weight updates. Prior approaches improve usability but often suffer from brevity bias, which drops domain insights for concise summaries, and from context collapse, where iterative rewriting erodes details over time. Building on the adaptive memory introduced by Dynamic Cheatsheet, we introduce ACE (Agentic Context Engineering), a framework that treats contexts as evolving playbooks that accumulate, refine, and organize strategies through a modular process of generation, reflection, and curation. ACE prevents collapse with structured, incremental updates that preserve detailed knowledge and scale with long-context models. Across agent and domain-specific benchmarks, ACE optimizes contexts both offline (e.g., system prompts) and online (e.g., agent memory), consistently outperforming strong baselines: +10.6% on agents and +8.6% on finance, while significantly reducing adaptation latency and rollout cost. Notably, ACE could adapt effectively without labeled supervision and instead by leveraging natural execution feedback. On the AppWorld leaderboard, ACE matches the top-ranked production-level agent on the overall average and surpasses it on the harder test-challenge split, despite using a smaller open-source model. These results show that comprehensive, evolving contexts enable scalable, efficient, and self-improving LLM systems with low overhead.

에이전트적 컨텍스트 엔지니어링: 자기 개선 언어 모델을 위한 컨텍스트 진화

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

초록

Support