ChatPaper.aiChatPaper

推測性雅可比去噪解碼:加速自回歸文本到圖像生成

Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation

October 10, 2025
作者: Yao Teng, Fuyun Wang, Xian Liu, Zhekai Chen, Han Shi, Yu Wang, Zhenguo Li, Weiyang Liu, Difan Zou, Xihui Liu
cs.AI

摘要

作為視覺內容生成的新範式,自回歸文本到圖像模型因其序列化的逐令牌解碼過程而面臨推理速度緩慢的問題,通常需要數千次模型前向傳播才能生成單張圖像。為解決這一效率問題,我們提出了推測性雅可比去噪解碼(SJD2)框架,該框架將去噪過程融入雅可比迭代中,從而實現自回歸模型中的並行令牌生成。我們的方法引入了一種下一乾淨令牌預測範式,使預訓練的自回歸模型能夠接受噪聲擾動的令牌嵌入,並通過低成本微調預測下一乾淨令牌。這一去噪範式引導模型趨向更穩定的雅可比軌跡。在推理過程中,我們的方法用高斯噪聲初始化令牌序列,並在嵌入空間中進行迭代的下一乾淨令牌預測。我們採用概率準則來驗證並並行接受多個令牌,並根據去噪軌跡對未接受的令牌進行下一輪迭代的細化。實驗表明,我們的方法能夠在保持生成圖像視覺質量的同時,通過減少模型前向傳播次數來加速生成過程。
English
As a new paradigm of visual content generation, autoregressive text-to-image models suffer from slow inference due to their sequential token-by-token decoding process, often requiring thousands of model forward passes to generate a single image. To address this inefficiency, we propose Speculative Jacobi-Denoising Decoding (SJD2), a framework that incorporates the denoising process into Jacobi iterations to enable parallel token generation in autoregressive models. Our method introduces a next-clean-token prediction paradigm that enables the pre-trained autoregressive models to accept noise-perturbed token embeddings and predict the next clean tokens through low-cost fine-tuning. This denoising paradigm guides the model towards more stable Jacobi trajectories. During inference, our method initializes token sequences with Gaussian noise and performs iterative next-clean-token-prediction in the embedding space. We employ a probabilistic criterion to verify and accept multiple tokens in parallel, and refine the unaccepted tokens for the next iteration with the denoising trajectory. Experiments show that our method can accelerate generation by reducing model forward passes while maintaining the visual quality of generated images.
PDF32October 13, 2025