エージェンシック・コンテキスト・エンジニアリング：自己改善型言語モデルのためのコンテキスト進化

要旨

大規模言語モデル（LLM）の応用、特にエージェントやドメイン固有の推論において、コンテキスト適応（重み更新ではなく、入力に指示、戦略、または証拠を加えて修正する手法）への依存が高まっています。従来のアプローチは使いやすさを向上させますが、簡潔な要約のためにドメインの洞察が失われる「簡潔性バイアス」や、反復的な書き換えによって詳細が徐々に失われる「コンテキスト崩壊」に悩まされることが多かったです。Dynamic Cheatsheetで導入された適応型メモリを基盤として、我々はACE（Agentic Context Engineering）を提案します。これは、コンテキストを進化するプレイブックとして扱い、生成、反映、キュレーションというモジュール化されたプロセスを通じて戦略を蓄積、洗練、整理するフレームワークです。ACEは、詳細な知識を保持し、長文脈モデルにスケールする構造化された漸進的更新により、崩壊を防ぎます。エージェントおよびドメイン固有のベンチマークにおいて、ACEはオフライン（システムプロンプトなど）とオンライン（エージェントメモリなど）の両方でコンテキストを最適化し、強力なベースラインを一貫して上回りました：エージェントで+10.6%、金融で+8.6%の向上を達成し、適応の遅延と展開コストを大幅に削減しました。特に、ACEはラベル付きの監督なしで、自然な実行フィードバックを活用して効果的に適応できました。AppWorldリーダーボードでは、ACEはトップランクのプロダクションレベルエージェントと全体平均で同等の性能を示し、より難しいテストチャレンジ分割ではそれを上回りました。これらは、より小規模なオープンソースモデルを使用しているにもかかわらず達成された結果です。これらの結果は、包括的で進化するコンテキストが、低オーバーヘッドでスケーラブルで効率的、かつ自己改善型のLLMシステムを実現することを示しています。

English

Large language model (LLM) applications such as agents and domain-specific reasoning increasingly rely on context adaptation -- modifying inputs with instructions, strategies, or evidence, rather than weight updates. Prior approaches improve usability but often suffer from brevity bias, which drops domain insights for concise summaries, and from context collapse, where iterative rewriting erodes details over time. Building on the adaptive memory introduced by Dynamic Cheatsheet, we introduce ACE (Agentic Context Engineering), a framework that treats contexts as evolving playbooks that accumulate, refine, and organize strategies through a modular process of generation, reflection, and curation. ACE prevents collapse with structured, incremental updates that preserve detailed knowledge and scale with long-context models. Across agent and domain-specific benchmarks, ACE optimizes contexts both offline (e.g., system prompts) and online (e.g., agent memory), consistently outperforming strong baselines: +10.6% on agents and +8.6% on finance, while significantly reducing adaptation latency and rollout cost. Notably, ACE could adapt effectively without labeled supervision and instead by leveraging natural execution feedback. On the AppWorld leaderboard, ACE matches the top-ranked production-level agent on the overall average and surpasses it on the harder test-challenge split, despite using a smaller open-source model. These results show that comprehensive, evolving contexts enable scalable, efficient, and self-improving LLM systems with low overhead.

エージェンシック・コンテキスト・エンジニアリング：自己改善型言語モデルのためのコンテキスト進化

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

要旨

Support