Squeez：面向编程智能体的任务条件化工具输出剪枝

摘要

编程智能体在处理工具输出时，往往需要反复读取冗长的观察结果，而实际上每个输出中仅有少量信息对下一步操作至关重要。我们研究任务导向型工具输出剪枝技术：针对特定查询和工具输出，返回智能体下一步需检查的最小原文证据块。基于SWE-bench仓库交互记录和跨生态系统的合成工具输出，我们构建了包含11,477个样本的基准数据集，其中包含人工标注的618个测试样本。通过LoRA微调Qwen 3.5 2B模型，并将其与大型零样本模型及启发式剪枝基线进行对比。实验表明，我们的模型在去除92%输入词元的同时，实现了0.86的召回率和0.80的F1值，较零样本Qwen 3.5 35B A3B模型召回率提升11个百分点，且显著优于所有启发式基线方法。

English

Coding agents repeatedly consume long tool observations even though only a small fraction of each observation matters for the next step. We study task-conditioned tool-output pruning: given a focused query and one tool output, return the smallest verbatim evidence block the agent should inspect next. We introduce a benchmark of 11,477 examples built from SWE-bench repository interactions and synthetic multi-ecosystem tool outputs, with a manually curated 618-example test set. We fine-tune Qwen 3.5 2B with LoRA and compare it against larger zero-shot models and heuristic pruning baselines. Our model reaches 0.86 recall and 0.80 F1 while removing 92% of input tokens, outperforming zero-shot Qwen 3.5 35B A3B by 11 recall points and all heuristic baselines by a wide margin.