擴散模型中的令牌擾動引導
Token Perturbation Guidance for Diffusion Models
June 10, 2025
作者: Javad Rajabi, Soroush Mehraban, Seyedmorteza Sadat, Babak Taati
cs.AI
摘要
無分類器指導(CFG)已成為現代擴散模型中的關鍵組件,旨在提升生成質量及與輸入條件的對齊度。然而,CFG需依賴特定的訓練流程,且僅限於條件生成。為克服這些限制,我們提出了令牌擾動指導(TPG),這是一種新穎的方法,直接在擴散網絡的中間令牌表示上應用擾動矩陣。TPG採用保範數的置換操作,提供有效且穩定的指導信號,從而無需改變架構即可提升生成質量。因此,TPG無需訓練且對輸入條件保持中立,使其易於應用於條件生成與無條件生成兩類場景。我們進一步分析了TPG提供的指導項,並表明其對採樣的影響相比現有的無訓練指導技術更接近CFG。在SDXL和Stable Diffusion 2.1上的大量實驗顯示,TPG在無條件生成方面相較SDXL基線實現了近2倍的FID提升,同時在提示對齊度上與CFG相當。這些結果確立了TPG作為一種通用、條件無關的指導方法,將CFG類似的優勢帶給了更廣泛的擴散模型。代碼已公開於https://github.com/TaatiTeam/Token-Perturbation-Guidance。
English
Classifier-free guidance (CFG) has become an essential component of modern
diffusion models to enhance both generation quality and alignment with input
conditions. However, CFG requires specific training procedures and is limited
to conditional generation. To address these limitations, we propose Token
Perturbation Guidance (TPG), a novel method that applies perturbation matrices
directly to intermediate token representations within the diffusion network.
TPG employs a norm-preserving shuffling operation to provide effective and
stable guidance signals that improve generation quality without architectural
changes. As a result, TPG is training-free and agnostic to input conditions,
making it readily applicable to both conditional and unconditional generation.
We further analyze the guidance term provided by TPG and show that its effect
on sampling more closely resembles CFG compared to existing training-free
guidance techniques. Extensive experiments on SDXL and Stable Diffusion 2.1
show that TPG achieves nearly a 2times improvement in FID for unconditional
generation over the SDXL baseline, while closely matching CFG in prompt
alignment. These results establish TPG as a general, condition-agnostic
guidance method that brings CFG-like benefits to a broader class of diffusion
models. The code is available at
https://github.com/TaatiTeam/Token-Perturbation-Guidance