基于观测上下文压缩的高效终端智能体自演进框架

摘要

随着模型能力的进步，研究重心日益转向以终端为中心的长周期多轮次智能体任务，这类任务通常将原始环境反馈保存在交互历史中以支持后续决策。然而持续保留此类反馈会引入大量冗余，导致累计令牌成本随步数呈二次方增长，从而阻碍长周期推理。虽然观测压缩可以缓解这一问题，但终端环境的异构性使得基于启发式或固定提示的方法难以泛化。我们提出TOCA——一种即插即用、自我演进的终端智能体压缩框架，能够从交互轨迹中自动发现并优化现有终端智能体的压缩规则。在TerminalBench（TB 1.0和TB 2.0）及四个额外终端相关基准（即SWE-Bench Lite、CompileBench、DevEval和CRUST-Bench）上的实验表明，TOCA能在主流智能体框架和强骨干模型中持续提升性能。配合MiniMax-2.5模型时，它在多数基准上实现性能提升的同时将令牌开销降低约10%。在TerminalBench上，该框架为强智能体模型带来1%-4%的稳定增益，并在相同令牌预算下进一步将准确率提升约2%-3%。这些结果验证了自我演进、任务感知的压缩方法对终端智能体的有效性和泛化能力。

English

As model capabilities advance, research has increasingly shifted toward long-horizon, multi-turn terminal-centric agentic tasks, where raw environment feedback is often preserved in the interaction history to support future decisions. However, repeatedly retaining such feedback introduces substantial redundancy and causes cumulative token cost to grow quadratically with the number of steps, hindering long-horizon reasoning. Although observation compression can mitigate this issue, the heterogeneity of terminal environments makes heuristic-based or fixed-prompt methods difficult to generalize. We propose TACO, a plug-and-play, self-evolving Terminal Agent Compression framework that automatically discovers and refines compression rules from interaction trajectories for existing terminal agents. Experiments on TerminalBench (TB 1.0 and TB 2.0) and four additional terminal-related benchmarks (i.e., SWE-Bench Lite, CompileBench, DevEval, and CRUST-Bench) show that TACO consistently improves performance across mainstream agent frameworks and strong backbone models. With MiniMax-2.5, it improves performance on most benchmarks while reducing token overhead by around 10%. On TerminalBench, it brings consistent gains of 1%-4% across strong agentic models, and further improves accuracy by around 2%-3% under the same token budget. These results demonstrate the effectiveness and generalization of self-evolving, task-aware compression for terminal agents.

基于观测上下文压缩的高效终端智能体自演进框架

A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression

摘要

Support