基於觀測上下文壓縮的高效終端智能體自進化框架

摘要

隨著模型能力的不斷進步，研究重心已逐漸轉向以終端為中心的長視野、多輪次智能體任務。這類任務通常會將原始環境回饋保留在互動歷史中，以支援後續決策。然而，反覆保留此類回饋會引入大量冗餘，導致累積的令牌成本隨步驟數量呈二次方增長，從而阻礙長視野推理。雖然觀測壓縮可緩解此問題，但終端環境的異質性使得基於啟發式或固定提示的方法難以泛化。我們提出TACO——一種即插即用、自我演進的終端智能體壓縮框架，能從互動軌跡中自動發現並優化現有終端智能體的壓縮規則。在TerminalBench（TB 1.0和TB 2.0）及四個額外終端相關基準（即SWE-Bench Lite、CompileBench、DevEval和CRUST-Bench）上的實驗表明，TACO能持續提升主流智能體框架與強骨幹模型的效能。搭配MiniMax-2.5使用時，它在多數基準上實現效能提升的同時，將令牌開銷降低約10%。在TerminalBench中，該框架為各類強智能體模型帶來1%-4%的穩定增益，並在相同令牌預算下進一步提升約2%-3%的準確率。這些結果驗證了自我演進、任務感知的壓縮方法對終端智能體的有效性與泛化能力。

English

As model capabilities advance, research has increasingly shifted toward long-horizon, multi-turn terminal-centric agentic tasks, where raw environment feedback is often preserved in the interaction history to support future decisions. However, repeatedly retaining such feedback introduces substantial redundancy and causes cumulative token cost to grow quadratically with the number of steps, hindering long-horizon reasoning. Although observation compression can mitigate this issue, the heterogeneity of terminal environments makes heuristic-based or fixed-prompt methods difficult to generalize. We propose TACO, a plug-and-play, self-evolving Terminal Agent Compression framework that automatically discovers and refines compression rules from interaction trajectories for existing terminal agents. Experiments on TerminalBench (TB 1.0 and TB 2.0) and four additional terminal-related benchmarks (i.e., SWE-Bench Lite, CompileBench, DevEval, and CRUST-Bench) show that TACO consistently improves performance across mainstream agent frameworks and strong backbone models. With MiniMax-2.5, it improves performance on most benchmarks while reducing token overhead by around 10%. On TerminalBench, it brings consistent gains of 1%-4% across strong agentic models, and further improves accuracy by around 2%-3% under the same token budget. These results demonstrate the effectiveness and generalization of self-evolving, task-aware compression for terminal agents.

基於觀測上下文壓縮的高效終端智能體自進化框架

A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression

摘要

Support