ChatPaper.aiChatPaper

利用无需训练的猜测雅可比解码加速自回归文本到图像生成

Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding

October 2, 2024
作者: Yao Teng, Han Shi, Xian Liu, Xuefei Ning, Guohao Dai, Yu Wang, Zhenguo Li, Xihui Liu
cs.AI

摘要

当前的大型自回归模型能够生成高质量、高分辨率的图像,但这些模型在推断过程中需要进行数百甚至数千次的下一个标记预测,导致耗时较长。在现有研究中,Jacobi解码,一种迭代并行解码算法,被用于加速自回归生成,并且可以在无需训练的情况下执行。然而,Jacobi解码依赖于确定性标准来确定迭代的收敛性。因此,它适用于贪婪解码,但与基于抽样的解码不兼容,而后者对于当前自回归文本到图像生成中的视觉质量和多样性至关重要。本文提出了一种无需训练的概率并行解码算法,名为推测Jacobi解码(SJD),用于加速自回归文本到图像生成。通过引入概率收敛标准,我们的SJD加速了自回归文本到图像生成的推断过程,同时保持了基于抽样的标记解码中的随机性,使模型能够生成多样化的图像。具体来说,SJD促使模型在每个步骤预测多个标记,并根据概率标准接受标记,使模型能够比传统的下一个标记预测范式更快地生成图像。我们还研究了利用视觉数据的空间局部性的标记初始化策略,以在特定情况下进一步提高加速比。我们对多个自回归文本到图像生成模型进行了我们提出的SJD实验,展示了模型加速的有效性,同时不牺牲视觉质量。
English
The current large auto-regressive models can generate high-quality, high-resolution images, but these models require hundreds or even thousands of steps of next-token prediction during inference, resulting in substantial time consumption. In existing studies, Jacobi decoding, an iterative parallel decoding algorithm, has been used to accelerate the auto-regressive generation and can be executed without training. However, the Jacobi decoding relies on a deterministic criterion to determine the convergence of iterations. Thus, it works for greedy decoding but is incompatible with sampling-based decoding which is crucial for visual quality and diversity in the current auto-regressive text-to-image generation. In this paper, we propose a training-free probabilistic parallel decoding algorithm, Speculative Jacobi Decoding (SJD), to accelerate auto-regressive text-to-image generation. By introducing a probabilistic convergence criterion, our SJD accelerates the inference of auto-regressive text-to-image generation while maintaining the randomness in sampling-based token decoding and allowing the model to generate diverse images. Specifically, SJD facilitates the model to predict multiple tokens at each step and accepts tokens based on the probabilistic criterion, enabling the model to generate images with fewer steps than the conventional next-token-prediction paradigm. We also investigate the token initialization strategies that leverage the spatial locality of visual data to further improve the acceleration ratio under specific scenarios. We conduct experiments for our proposed SJD on multiple auto-regressive text-to-image generation models, showing the effectiveness of model acceleration without sacrificing the visual quality.

Summary

AI-Generated Summary

PDF182November 16, 2024