LK损失：面向推测解码的直接接受率优化

摘要

推测解码技术通过采用轻量级草稿模型生成候选标记，再由目标模型并行验证的方式，加速自回归大语言模型（LLM）的推理过程。其加速效果主要取决于接受率，而标准训练方法将KL散度最小化作为代理优化目标。虽然KL散度与接受率具有相同的全局最优解，但能力有限的草稿模型通常会收敛至次优解，此时最小化KL散度并不能保证接受率最大化。针对该问题，我们提出LK损失函数——这种特殊训练目标直接以接受率为优化对象。在涵盖四种草稿架构和六种参数量从80亿到6850亿不等的目标模型的全面实验中，相较于基于KL散度的标准训练方法，我们的方案在所有配置下均实现了接受指标的稳定提升。我们在通用文本、代码和数学三大领域进行评估，报告显示平均接受长度最高可提升8-10%。LK损失函数具有实现简便、零计算开销的优势，可直接集成至现有推测训练框架，成为当前草稿训练目标极具吸引力的替代方案。

English

Speculative decoding accelerates autoregressive large language model (LLM) inference by using a lightweight draft model to propose candidate tokens that are then verified in parallel by the target model. The speedup is significantly determined by the acceptance rate, yet standard training minimizes Kullback-Leibler (KL) divergence as a proxy objective. While KL divergence and acceptance rate share the same global optimum, small draft models, having limited capacity, typically converge to suboptimal solutions where minimizing KL does not guarantee maximizing acceptance rate. To address this issue, we propose LK losses, special training objectives that directly target acceptance rate. Comprehensive experiments across four draft architectures and six target models, ranging from 8B to 685B parameters, demonstrate consistent improvements in acceptance metrics across all configurations compared to the standard KL-based training. We evaluate our approach on general, coding and math domains and report gains of up to 8-10% in average acceptance length. LK losses are easy to implement, introduce no computational overhead and can be directly integrated into any existing speculator training framework, making them a compelling alternative to the existing draft training objectives.

LK损失：面向推测解码的直接接受率优化

LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding

摘要

Support