多重草案推测抽样:规范架构与理论极限
Multi-Draft Speculative Sampling: Canonical Architectures and Theoretical Limits
October 23, 2024
作者: Ashish Khisti, M. Reza Ebrahimi, Hassan Dbouk, Arash Behboodi, Roland Memisevic, Christos Louizos
cs.AI
摘要
我们考虑多次草案的推测抽样,其中提议序列是从不同草案模型独立抽样的。在每一步中,一个基于标记级别的草案选择方案接受有效标记列表作为输入,并生成一个输出标记,其分布与目标模型相匹配。先前的研究表明,最优方案(最大化接受输入标记之一的概率)可以被视为线性规划的解决方案。在这项工作中,我们展示了最优方案可以分解为两步解决方案:在第一步中,使用一种重要性抽样(IS)类型方案选择一个中间标记;在第二步中,应用(单次草案)推测抽样来生成输出标记。对于两个相同的草案模型的情况,我们进一步1)确定目标模型和草案模型的分布使得接受概率等于一的必要和充分条件,2)提供最优接受概率的显式表达式。我们的理论分析还推动了一类基于加权重要性抽样的标记级别选择方案。我们的实验结果表明,在许多场景中,与基准方案相比,可实现的块效率和标记速率均有一致改进。
English
We consider multi-draft speculative sampling, where the proposal sequences
are sampled independently from different draft models. At each step, a
token-level draft selection scheme takes a list of valid tokens as input and
produces an output token whose distribution matches that of the target model.
Previous works have demonstrated that the optimal scheme (which maximizes the
probability of accepting one of the input tokens) can be cast as a solution to
a linear program. In this work we show that the optimal scheme can be
decomposed into a two-step solution: in the first step an importance sampling
(IS) type scheme is used to select one intermediate token; in the second step
(single-draft) speculative sampling is applied to generate the output token.
For the case of two identical draft models we further 1) establish a necessary
and sufficient condition on the distributions of the target and draft models
for the acceptance probability to equal one and 2) provide an explicit
expression for the optimal acceptance probability. Our theoretical analysis
also motives a new class of token-level selection scheme based on weighted
importance sampling. Our experimental results demonstrate consistent
improvements in the achievable block efficiency and token rates over baseline
schemes in a number of scenarios.Summary
AI-Generated Summary