AdaSPEC:面向高效推测解码器的选择性知识蒸馏
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders
October 22, 2025
作者: Yuezhou Hu, Jiaxin Guo, Xinyu Feng, Tuo Zhao
cs.AI
摘要
推测解码(SD)通过采用小型草稿模型生成预测,再由大型目标模型进行验证,从而加速大语言模型的推理效率。该技术的有效性取决于两个模型之间的对齐程度,通常通过知识蒸馏(KD)来增强这种对齐。然而,传统KD方法旨在最小化草稿模型与目标模型在所有词元上的KL散度,这一目标与SD技术最大化词元接受率的本质目标存在偏差。由于草稿模型受容量限制难以完全吸收目标模型的知识,往往导致性能欠佳。为解决这一难题,我们提出AdaSPEC方法,将选择性词元过滤机制引入KD过程。该方法通过参考模型识别并过滤难以拟合的词元,使草稿模型能更专注于在简单词元上与目标模型对齐。这一策略在不影响生成质量的前提下显著提升了整体词元接受率。我们在算术推理、指令遵循、代码生成和文本摘要等多样化任务上进行了评估,使用31M/1.4B和350M/2.7B两种参数规模的模型配置。实验结果表明,AdaSPEC在所有任务中持续优于当前最先进的DistillSpec方法,词元接受率最高提升达15%。相关代码已开源:https://github.com/yuezhouhu/adaspec。
English
Speculative Decoding (SD) accelerates large language model inference by
employing a small draft model to generate predictions, which are then verified
by a larger target model. The effectiveness of SD hinges on the alignment
between these models, which is typically enhanced by Knowledge Distillation
(KD). However, conventional KD methods aim to minimize the KL divergence
between the draft and target models across all tokens, a goal that is
misaligned with the true objective of SD, which is to maximize token acceptance
rate. Therefore, draft models often struggle to fully assimilate the target
model's knowledge due to capacity constraints, leading to suboptimal
performance. To address this challenge, we propose AdaSPEC, a novel method that
incorporates selective token filtering into the KD process. AdaSPEC utilizes a
reference model to identify and filter out difficult-to-fit tokens, enabling
the distillation of a draft model that better aligns with the target model on
simpler tokens. This approach improves the overall token acceptance rate
without compromising generation quality. We evaluate AdaSPEC across diverse
tasks, including arithmetic reasoning, instruction-following, coding, and
summarization, using model configurations of 31M/1.4B and 350M/2.7B parameters.
Our results demonstrate that AdaSPEC consistently outperforms the
state-of-the-art DistillSpec method, achieving higher acceptance rates across
all tasks (up to 15\%). The code is publicly available at
https://github.com/yuezhouhu/adaspec.