擴散模型中的令牌擾動引導

摘要

無分類器指導（CFG）已成為現代擴散模型中的關鍵組件，旨在提升生成質量及與輸入條件的對齊度。然而，CFG需依賴特定的訓練流程，且僅限於條件生成。為克服這些限制，我們提出了令牌擾動指導（TPG），這是一種新穎的方法，直接在擴散網絡的中間令牌表示上應用擾動矩陣。TPG採用保範數的置換操作，提供有效且穩定的指導信號，從而無需改變架構即可提升生成質量。因此，TPG無需訓練且對輸入條件保持中立，使其易於應用於條件生成與無條件生成兩類場景。我們進一步分析了TPG提供的指導項，並表明其對採樣的影響相比現有的無訓練指導技術更接近CFG。在SDXL和Stable Diffusion 2.1上的大量實驗顯示，TPG在無條件生成方面相較SDXL基線實現了近2倍的FID提升，同時在提示對齊度上與CFG相當。這些結果確立了TPG作為一種通用、條件無關的指導方法，將CFG類似的優勢帶給了更廣泛的擴散模型。代碼已公開於https://github.com/TaatiTeam/Token-Perturbation-Guidance。

English

Classifier-free guidance (CFG) has become an essential component of modern diffusion models to enhance both generation quality and alignment with input conditions. However, CFG requires specific training procedures and is limited to conditional generation. To address these limitations, we propose Token Perturbation Guidance (TPG), a novel method that applies perturbation matrices directly to intermediate token representations within the diffusion network. TPG employs a norm-preserving shuffling operation to provide effective and stable guidance signals that improve generation quality without architectural changes. As a result, TPG is training-free and agnostic to input conditions, making it readily applicable to both conditional and unconditional generation. We further analyze the guidance term provided by TPG and show that its effect on sampling more closely resembles CFG compared to existing training-free guidance techniques. Extensive experiments on SDXL and Stable Diffusion 2.1 show that TPG achieves nearly a 2times improvement in FID for unconditional generation over the SDXL baseline, while closely matching CFG in prompt alignment. These results establish TPG as a general, condition-agnostic guidance method that brings CFG-like benefits to a broader class of diffusion models. The code is available at https://github.com/TaatiTeam/Token-Perturbation-Guidance

擴散模型中的令牌擾動引導

Token Perturbation Guidance for Diffusion Models

摘要

Support