壓縮蒸餾:用於高效知識蒸餾的推理軌跡壓縮
Compress-Distill: Reasoning Trace Compression for Efficient Knowledge Distillation
June 4, 2026
作者: Maxime Griot, Paul Steven Scotti, Tanishq Mathew Abraham
cs.AI
摘要
推理模型会产生冗长的思维链轨迹,这些轨迹的蒸馏成本高昂,并促使学生模型输出冗长的内容。我们研究了在知识蒸馏前对这些轨迹进行事后压缩的方法。两个教师模型——Qwen3.5-397B-A17B 和 gpt-oss-120B——各自生成了约 283k 条正确轨迹;随后两个经过指令微调的模型将其压缩至原始字符长度的 8.6% 至 21.0%。在包含 48 次运行的主网格实验及七次 Qwen 教师截断消融实验中,压缩轨迹将训练 token 量降至原始水平的 12% 至 30%,训练速度提升 2.0 至 7.6 倍,推理输出长度缩短 3 至 19 倍(在较短的 gpt-oss 教师模型下缩减幅度较小)。然而,在所有规模和两个教师模型下,原始轨迹仍保持最高的下游准确率。一项长度匹配的原始轨迹截断消融实验表明,压缩并非仅仅得益于更小的 token 预算:模型压缩的轨迹通常优于或持平于朴素截断,尤其是在较小的学生模型上,同时保持更短的推理输出。总体而言,推理轨迹压缩提供了一种准确率与效率之间的权衡,而非免费的改进:学生模型保留了原始轨迹准确率的多达 96%,同时每 token 效率提升多达 18 倍;在 0.8B 规模下,采用 LoRA 时,压缩轨迹缩小了原始与压缩之间的差距,但并未超越原始轨迹。
English
Reasoning models produce long chain-of-thought traces that are costly to distill and encourage verbose student outputs. We study post-hoc compression of such traces before knowledge distillation. Two teachers, Qwen3.5-397B-A17B and gpt-oss-120B, generate about 283k correct traces each; two instruction-tuned models then compress them to 8.6-21.0% of their original character length. Across a 48-run main grid plus seven Qwen-teacher truncation ablations, compressed traces reduce training tokens to 12-30% of raw, speed up training by 2.0-7.6x, and shorten inference outputs by 3-19x with smaller reductions under the shorter gpt-oss teacher. However, raw traces retain the highest downstream accuracy at every scale and for both teachers. A length-matched raw-trace truncation ablation shows that compression is not merely benefiting from a smaller token budget: model-compressed traces usually beat or match naive truncation, especially for smaller students, while maintaining shorter inference outputs. Overall, reasoning-trace compression offers an accuracy-efficiency trade-off rather than a free improvement: students retain up to 96% of raw-trace accuracy while gaining up to 18x higher per-token efficiency, and at the 0.8B scale under LoRA compressed traces narrow the raw-vs-compressed gap but do not exceed raw.