自己回帰画像生成のための連続的仮説解読

要旨

連続値オートレグレッシブ（AR）画像生成モデルは、離散トークンの対応物よりも著しい優越性を示し、著しい再構築品質と高い生成忠実度を披露しています。ただし、オートレグレッシブフレームワークの計算要件は、著しい推論オーバーヘッドをもたらします。仮説的デコーディングは、大規模言語モデル（LLM）の加速に効果的であることが証明されていますが、これを連続値ビジュアルオートレグレッシブモデルに適応する試みは未踏の領域です。本研究では、ディフュージョン分布に特有の出力分布の固有の特性を分析し、そのようなモデルに広く存在する拡散分布に適した受容基準を確立します。仮説的デコーディング出力分布に生じる不一致を克服するために、ノイズ除去軌道整列およびトークン事前補充手法を導入します。さらに、拒否段階で発生したサンプリング困難な分布を特定します。この問題を緩和するために、適切な上限を持つ入念な受容-拒否サンプリング手法を提案し、複雑な積分を回避します。実験結果は、当社の連続的仮説的デコーディングが、市販モデルで出色の2.33倍の高速化を達成し、出力分布を維持していることを示しています。コードはhttps://github.com/MarkXCloud/CSpD で入手可能です。

English

Continuous-valued Autoregressive (AR) image generation models have demonstrated notable superiority over their discrete-token counterparts, showcasing considerable reconstruction quality and higher generation fidelity. However, the computational demands of the autoregressive framework result in significant inference overhead. While speculative decoding has proven effective in accelerating Large Language Models (LLMs), their adaptation to continuous-valued visual autoregressive models remains unexplored. This work generalizes the speculative decoding algorithm from discrete tokens to continuous space. By analyzing the intrinsic properties of output distribution, we establish a tailored acceptance criterion for the diffusion distributions prevalent in such models. To overcome the inconsistency that occurred in speculative decoding output distributions, we introduce denoising trajectory alignment and token pre-filling methods. Additionally, we identify the hard-to-sample distribution in the rejection phase. To mitigate this issue, we propose a meticulous acceptance-rejection sampling method with a proper upper bound, thereby circumventing complex integration. Experimental results show that our continuous speculative decoding achieves a remarkable 2.33times speed-up on off-the-shelf models while maintaining the output distribution. Codes will be available at https://github.com/MarkXCloud/CSpD

自己回帰画像生成のための連続的仮説解読

Continuous Speculative Decoding for Autoregressive Image Generation

要旨

Support