扩散模型的令牌扰动引导
Token Perturbation Guidance for Diffusion Models
June 10, 2025
作者: Javad Rajabi, Soroush Mehraban, Seyedmorteza Sadat, Babak Taati
cs.AI
摘要
无分类器引导(CFG)已成为现代扩散模型提升生成质量及与输入条件对齐的关键组件。然而,CFG需特定训练流程,且仅限于条件生成。为克服这些局限,我们提出令牌扰动引导(TPG),一种直接在扩散网络中间令牌表示上应用扰动矩阵的新方法。TPG采用保范重排操作,提供有效且稳定的引导信号,无需架构改动即可提升生成质量。因此,TPG无需训练,对输入条件保持中立,轻松适用于条件与非条件生成。我们进一步分析了TPG提供的引导项,发现其采样效果较现有免训练引导技术更接近CFG。在SDXL与Stable Diffusion 2.1上的大量实验表明,TPG在无条件生成上较SDXL基线FID提升近2倍,同时在提示对齐上紧追CFG。这些成果确立了TPG作为一种通用、条件无关的引导方法,将CFG般的优势扩展至更广泛的扩散模型类别。代码已发布于https://github.com/TaatiTeam/Token-Perturbation-Guidance。
English
Classifier-free guidance (CFG) has become an essential component of modern
diffusion models to enhance both generation quality and alignment with input
conditions. However, CFG requires specific training procedures and is limited
to conditional generation. To address these limitations, we propose Token
Perturbation Guidance (TPG), a novel method that applies perturbation matrices
directly to intermediate token representations within the diffusion network.
TPG employs a norm-preserving shuffling operation to provide effective and
stable guidance signals that improve generation quality without architectural
changes. As a result, TPG is training-free and agnostic to input conditions,
making it readily applicable to both conditional and unconditional generation.
We further analyze the guidance term provided by TPG and show that its effect
on sampling more closely resembles CFG compared to existing training-free
guidance techniques. Extensive experiments on SDXL and Stable Diffusion 2.1
show that TPG achieves nearly a 2times improvement in FID for unconditional
generation over the SDXL baseline, while closely matching CFG in prompt
alignment. These results establish TPG as a general, condition-agnostic
guidance method that brings CFG-like benefits to a broader class of diffusion
models. The code is available at
https://github.com/TaatiTeam/Token-Perturbation-Guidance