Squeez: コーディングエージェントのためのタスク条件付きツール出力刈り込み

要旨

コーディングエージェントは、各観測のごく一部のみが次のステップに関連するにもかかわらず、長いツール観測データを繰り返し消費する。本研究では、タスク条件付きツール出力刈り込みを検討する：具体的には、焦点を絞ったクエリと1つのツール出力が与えられたとき、エージェントが次に検査すべき最小の逐語的証拠ブロックを返すことである。SWE-benchリポジトリのインタラクションと合成マルチエコシステムツール出力から構築した11,477例のベンチマークを導入し、手作業で精選された618例のテストセットを含む。LoRAを用いてQwen 3.5 2Bをファインチューニングし、大規模なゼロショットモデルやヒューリスティックな刈り込みベースラインと比較する。提案モデルは入力トークンの92%を削除しながら、0.86の再現率と0.80のF1スコアを達成し、ゼロショットQwen 3.5 35B A3Bを再現率で11ポイント上回り、全てのヒューリスティックベースラインを大きく凌駕する。

English

Coding agents repeatedly consume long tool observations even though only a small fraction of each observation matters for the next step. We study task-conditioned tool-output pruning: given a focused query and one tool output, return the smallest verbatim evidence block the agent should inspect next. We introduce a benchmark of 11,477 examples built from SWE-bench repository interactions and synthetic multi-ecosystem tool outputs, with a manually curated 618-example test set. We fine-tune Qwen 3.5 2B with LoRA and compare it against larger zero-shot models and heuristic pruning baselines. Our model reaches 0.86 recall and 0.80 F1 while removing 92% of input tokens, outperforming zero-shot Qwen 3.5 35B A3B by 11 recall points and all heuristic baselines by a wide margin.

Squeez: コーディングエージェントのためのタスク条件付きツール出力刈り込み

Squeez: Task-Conditioned Tool-Output Pruning for Coding Agents

要旨

Support