LK損失函數：面向推測解碼的直接接受率優化

摘要

投機解碼技術透過使用輕量級草稿模型來提出候選標記，再由目標模型並行驗證，從而加速自迴歸大型語言模型的推理過程。加速效果主要取決於接受率，但標準訓練方法僅將最小化KL散度作為代理目標。雖然KL散度與接受率具有相同的全局最優解，但能力有限的草稿模型通常會收斂至次優解，此時最小化KL散度並不能保證接受率最大化。為解決此問題，我們提出LK損失函數——這種特殊訓練目標直接針對接受率進行優化。在四種草稿架構與六個參數量從80億到6850億不等的目標模型上進行的全面實驗表明，相較於標準的KL散度訓練，所有配置的接受指標均獲得持續提升。我們在通用領域、編程領域和數學領域評估本方法，結果顯示平均接受長度最高可提升8-10%。LK損失函數易於實現，不引入額外計算開銷，可直接整合至現有投機訓練框架，成為現有草稿訓練目標的極具競爭力的替代方案。

English

Speculative decoding accelerates autoregressive large language model (LLM) inference by using a lightweight draft model to propose candidate tokens that are then verified in parallel by the target model. The speedup is significantly determined by the acceptance rate, yet standard training minimizes Kullback-Leibler (KL) divergence as a proxy objective. While KL divergence and acceptance rate share the same global optimum, small draft models, having limited capacity, typically converge to suboptimal solutions where minimizing KL does not guarantee maximizing acceptance rate. To address this issue, we propose LK losses, special training objectives that directly target acceptance rate. Comprehensive experiments across four draft architectures and six target models, ranging from 8B to 685B parameters, demonstrate consistent improvements in acceptance metrics across all configurations compared to the standard KL-based training. We evaluate our approach on general, coding and math domains and report gains of up to 8-10% in average acceptance length. LK losses are easy to implement, introduce no computational overhead and can be directly integrated into any existing speculator training framework, making them a compelling alternative to the existing draft training objectives.

LK損失函數：面向推測解碼的直接接受率優化

LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding

摘要

Support