観測的コンテクスト圧縮による効率的なターミナルエージェントのための自己進化フレームワーク

要旨

モデル能力が進化するにつれ、研究の焦点は長期的で多段階のターミナル中心的なエージェント課題に移行しており、生の環境フィードバックは将来の意思決定を支援するために対話履歴に保持されることが多い。しかし、このようなフィードバックを繰り返し保持することは、大幅な冗長性をもたらし、累積的なトークンコストをステップ数に対して二次関数的に増加させ、長期的な推論を妨げる。観測圧縮はこの問題を緩和できるが、ターミナル環境の不均質性により、ヒューリスティックベースや固定プロンプトの手法の汎化が困難である。我々はTACOを提案する。これは既存のターミナルエージェント向けに、対話軌跡から圧縮ルールを自動的に発見・洗練させるプラグアンドプレイ型の自己進化型ターミナルエージェント圧縮フレームワークである。TerminalBench（TB 1.0およびTB 2.0）と4つの追加ターミナル関連ベンチマーク（SWE-Bench Lite、CompileBench、DevEval、CRUST-Bench）での実験により、TACOが主流のエージェントフレームワークと強力な基盤モデルにおいて一貫して性能を向上させることが示された。MiniMax-2.5では、ほとんどのベンチマークで性能を向上させつつ、トークンオーバーヘッドを約10%削減した。TerminalBenchでは、強力なエージェントモデル全体で1%-4%の一貫した向上をもたらし、同じトークン予算条件下で精度を約2%-3%さらに向上させた。これらの結果は、ターミナルエージェントにおける自己進化型でタスクを意識した圧縮の有効性と汎化性を実証している。

English

As model capabilities advance, research has increasingly shifted toward long-horizon, multi-turn terminal-centric agentic tasks, where raw environment feedback is often preserved in the interaction history to support future decisions. However, repeatedly retaining such feedback introduces substantial redundancy and causes cumulative token cost to grow quadratically with the number of steps, hindering long-horizon reasoning. Although observation compression can mitigate this issue, the heterogeneity of terminal environments makes heuristic-based or fixed-prompt methods difficult to generalize. We propose TACO, a plug-and-play, self-evolving Terminal Agent Compression framework that automatically discovers and refines compression rules from interaction trajectories for existing terminal agents. Experiments on TerminalBench (TB 1.0 and TB 2.0) and four additional terminal-related benchmarks (i.e., SWE-Bench Lite, CompileBench, DevEval, and CRUST-Bench) show that TACO consistently improves performance across mainstream agent frameworks and strong backbone models. With MiniMax-2.5, it improves performance on most benchmarks while reducing token overhead by around 10%. On TerminalBench, it brings consistent gains of 1%-4% across strong agentic models, and further improves accuracy by around 2%-3% under the same token budget. These results demonstrate the effectiveness and generalization of self-evolving, task-aware compression for terminal agents.

観測的コンテクスト圧縮による効率的なターミナルエージェントのための自己進化フレームワーク

A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression

要旨

Support