多重草擬抽樣:典型架構與理論極限
Multi-Draft Speculative Sampling: Canonical Architectures and Theoretical Limits
October 23, 2024
作者: Ashish Khisti, M. Reza Ebrahimi, Hassan Dbouk, Arash Behboodi, Roland Memisevic, Christos Louizos
cs.AI
摘要
我們考慮多草案推測取樣,其中提議序列是從不同草案模型獨立取樣的。在每個步驟中,一個基於標記級的草案選擇方案將一個有效標記列表作為輸入,並生成一個輸出標記,其分佈與目標模型相匹配。先前的研究表明,最優方案(最大化接受其中一個輸入標記的概率)可以被視為線性規劃的解。在這項工作中,我們展示最優方案可以分解為兩步解決方案:在第一步中,使用一種重要性取樣(IS)類型方案來選擇一個中間標記;在第二步中,應用(單草案)推測取樣以生成輸出標記。對於兩個相同的草案模型的情況,我們進一步1)確立目標模型和草案模型的分佈條件,使接受概率等於一,並2)提供最優接受概率的明確表達式。我們的理論分析還促使了一類基於加權重要性取樣的標記級選擇方案。我們的實驗結果顯示,在多種情況下,相對於基準方案,可實現的區塊效率和標記速率均有一致改善。
English
We consider multi-draft speculative sampling, where the proposal sequences
are sampled independently from different draft models. At each step, a
token-level draft selection scheme takes a list of valid tokens as input and
produces an output token whose distribution matches that of the target model.
Previous works have demonstrated that the optimal scheme (which maximizes the
probability of accepting one of the input tokens) can be cast as a solution to
a linear program. In this work we show that the optimal scheme can be
decomposed into a two-step solution: in the first step an importance sampling
(IS) type scheme is used to select one intermediate token; in the second step
(single-draft) speculative sampling is applied to generate the output token.
For the case of two identical draft models we further 1) establish a necessary
and sufficient condition on the distributions of the target and draft models
for the acceptance probability to equal one and 2) provide an explicit
expression for the optimal acceptance probability. Our theoretical analysis
also motives a new class of token-level selection scheme based on weighted
importance sampling. Our experimental results demonstrate consistent
improvements in the achievable block efficiency and token rates over baseline
schemes in a number of scenarios.Summary
AI-Generated Summary