扩散模型的令牌扰动引导

摘要

无分类器引导（CFG）已成为现代扩散模型提升生成质量及与输入条件对齐的关键组件。然而，CFG需特定训练流程，且仅限于条件生成。为克服这些局限，我们提出令牌扰动引导（TPG），一种直接在扩散网络中间令牌表示上应用扰动矩阵的新方法。TPG采用保范重排操作，提供有效且稳定的引导信号，无需架构改动即可提升生成质量。因此，TPG无需训练，对输入条件保持中立，轻松适用于条件与非条件生成。我们进一步分析了TPG提供的引导项，发现其采样效果较现有免训练引导技术更接近CFG。在SDXL与Stable Diffusion 2.1上的大量实验表明，TPG在无条件生成上较SDXL基线FID提升近2倍，同时在提示对齐上紧追CFG。这些成果确立了TPG作为一种通用、条件无关的引导方法，将CFG般的优势扩展至更广泛的扩散模型类别。代码已发布于https://github.com/TaatiTeam/Token-Perturbation-Guidance。

English

Classifier-free guidance (CFG) has become an essential component of modern diffusion models to enhance both generation quality and alignment with input conditions. However, CFG requires specific training procedures and is limited to conditional generation. To address these limitations, we propose Token Perturbation Guidance (TPG), a novel method that applies perturbation matrices directly to intermediate token representations within the diffusion network. TPG employs a norm-preserving shuffling operation to provide effective and stable guidance signals that improve generation quality without architectural changes. As a result, TPG is training-free and agnostic to input conditions, making it readily applicable to both conditional and unconditional generation. We further analyze the guidance term provided by TPG and show that its effect on sampling more closely resembles CFG compared to existing training-free guidance techniques. Extensive experiments on SDXL and Stable Diffusion 2.1 show that TPG achieves nearly a 2times improvement in FID for unconditional generation over the SDXL baseline, while closely matching CFG in prompt alignment. These results establish TPG as a general, condition-agnostic guidance method that brings CFG-like benefits to a broader class of diffusion models. The code is available at https://github.com/TaatiTeam/Token-Perturbation-Guidance

扩散模型的令牌扰动引导

Token Perturbation Guidance for Diffusion Models

摘要

Support