LK損失：推測デコーディングのための直接受容率最適化

要旨

speculative decoding（投機的デコーディング）は、軽量なドラフトモデルが候補トークンを提案し、それをターゲットモデルが並列で検証する方式により、自己回帰型大規模言語モデル（LLM）の推論を高速化する技術です。その高速化効果は主に受理率（acceptance rate）によって決まりますが、従来の学習では代理目的関数としてKLダイバージェンス（Kullback-Leibler divergence）の最小化が用いられてきました。KLダイバージェンスと受理率は大域的最適点を共有するものの、容量が限られた小型のドラフトモデルは通常、KL最小化が受理率最大化を保証しない局所最適解に収束しがちです。この問題を解決するため、我々は受理率を直接最適化する特別な学習目的関数であるLK lossesを提案します。4種類のドラフトモデルアーキテクチャと8Bから685Bパラメータ規模の6つのターゲットモデルを用いた網羅的実験により、従来のKLベースの学習と比較して、あらゆる設定で受理率関連指標の一貫した改善が実証されました。一般領域、コード生成、数学推論の各ドメインで評価を行い、平均受理長（average acceptance length）で最大8-10%の向上を確認しています。LK lossesは実装が容易で、計算オーバーヘッドがなく、既存のあらゆるspeculator学習フレームワークに直接統合可能であり、既存のドラフトモデル学習目的関数に対する有力な代替手法となります。

English

Speculative decoding accelerates autoregressive large language model (LLM) inference by using a lightweight draft model to propose candidate tokens that are then verified in parallel by the target model. The speedup is significantly determined by the acceptance rate, yet standard training minimizes Kullback-Leibler (KL) divergence as a proxy objective. While KL divergence and acceptance rate share the same global optimum, small draft models, having limited capacity, typically converge to suboptimal solutions where minimizing KL does not guarantee maximizing acceptance rate. To address this issue, we propose LK losses, special training objectives that directly target acceptance rate. Comprehensive experiments across four draft architectures and six target models, ranging from 8B to 685B parameters, demonstrate consistent improvements in acceptance metrics across all configurations compared to the standard KL-based training. We evaluate our approach on general, coding and math domains and report gains of up to 8-10% in average acceptance length. LK losses are easy to implement, introduce no computational overhead and can be directly integrated into any existing speculator training framework, making them a compelling alternative to the existing draft training objectives.

LK損失：推測デコーディングのための直接受容率最適化

LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding

要旨

Support