BanditSpec:基于强盗算法的自适应推测解码
BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms
May 21, 2025
作者: Yunlong Hou, Fengzhuo Zhang, Cunxiao Du, Xuan Zhang, Jiachun Pan, Tianyu Pang, Chao Du, Vincent Y. F. Tan, Zhuoran Yang
cs.AI
摘要
推測解碼已成為加速大型語言模型(LLM)推理同時保持其卓越文本生成性能的流行方法。以往的方法要么採用固定的推測解碼配置,不考慮前綴詞彙,要么通過離線或在線方式訓練草稿模型以使其與上下文對齊。本文提出了一種無需訓練的在線學習框架,能夠在文本生成過程中自適應地選擇推測解碼的超參數配置。我們首先將這一超參數選擇問題形式化為多臂老虎機問題,並提供了一個通用的推測解碼框架BanditSpec。此外,設計並分析了兩種基於老虎機的超參數選擇算法UCBSpec和EXP3Spec,並從一個新穎的量度——停止時間遺憾——進行了分析。我們在隨機和對抗性獎勵設置下對這一遺憾進行了上界分析。通過推導信息論上的不可能性結果,表明UCBSpec的遺憾性能在通用常數範圍內是最優的。最後,通過LLaMA3和Qwen2的大量實證實驗證明,與現有方法相比,我們的算法是有效的,並且在模擬真實LLM服務場景中,面對多樣化的輸入提示,其吞吐量接近於最佳超參數的預言值。
English
Speculative decoding has emerged as a popular method to accelerate the
inference of Large Language Models (LLMs) while retaining their superior text
generation performance. Previous methods either adopt a fixed speculative
decoding configuration regardless of the prefix tokens, or train draft models
in an offline or online manner to align them with the context. This paper
proposes a training-free online learning framework to adaptively choose the
configuration of the hyperparameters for speculative decoding as text is being
generated. We first formulate this hyperparameter selection problem as a
Multi-Armed Bandit problem and provide a general speculative decoding framework
BanditSpec. Furthermore, two bandit-based hyperparameter selection algorithms,
UCBSpec and EXP3Spec, are designed and analyzed in terms of a novel quantity,
the stopping time regret. We upper bound this regret under both stochastic and
adversarial reward settings. By deriving an information-theoretic impossibility
result, it is shown that the regret performance of UCBSpec is optimal up to
universal constants. Finally, extensive empirical experiments with LLaMA3 and
Qwen2 demonstrate that our algorithms are effective compared to existing
methods, and the throughput is close to the oracle best hyperparameter in
simulated real-life LLM serving scenarios with diverse input prompts.Summary
AI-Generated Summary