在扩散模型中消除高引导尺度的过饱和和伪影

摘要

无分类器指导（CFG）对于改善扩散模型中生成质量和输入条件与最终输出之间的对齐至关重要。虽然通常需要较高的指导尺度来增强这些方面，但也会导致过饱和和不真实的伪影。在本文中，我们重新审视了CFG更新规则，并引入了修改以解决这一问题。我们首先将CFG中的更新项分解为与条件模型预测平行和正交的两个分量，并观察到平行分量主要导致过饱和，而正交分量则提高了图像质量。因此，我们提出减小平行分量的权重以实现高质量的生成而不过饱和。此外，我们将CFG与梯度上升之间建立联系，并基于这一见解引入了一种新的重新缩放和动量方法用于CFG更新规则。我们的方法，称为自适应投影指导（APG），保留了CFG的提高质量优势，同时使得可以在不过饱和的情况下使用更高的指导尺度。APG易于实现，并在采样过程中几乎不增加额外的计算负担。通过大量实验证明，APG与各种条件扩散模型和采样器兼容，导致改进的FID、召回率和饱和度分数，同时保持与CFG可比的精度，使我们的方法成为标准无分类器指导的卓越即插即用替代方案。

English

Classifier-free guidance (CFG) is crucial for improving both generation quality and alignment between the input condition and final output in diffusion models. While a high guidance scale is generally required to enhance these aspects, it also causes oversaturation and unrealistic artifacts. In this paper, we revisit the CFG update rule and introduce modifications to address this issue. We first decompose the update term in CFG into parallel and orthogonal components with respect to the conditional model prediction and observe that the parallel component primarily causes oversaturation, while the orthogonal component enhances image quality. Accordingly, we propose down-weighting the parallel component to achieve high-quality generations without oversaturation. Additionally, we draw a connection between CFG and gradient ascent and introduce a new rescaling and momentum method for the CFG update rule based on this insight. Our approach, termed adaptive projected guidance (APG), retains the quality-boosting advantages of CFG while enabling the use of higher guidance scales without oversaturation. APG is easy to implement and introduces practically no additional computational overhead to the sampling process. Through extensive experiments, we demonstrate that APG is compatible with various conditional diffusion models and samplers, leading to improved FID, recall, and saturation scores while maintaining precision comparable to CFG, making our method a superior plug-and-play alternative to standard classifier-free guidance.

在扩散模型中消除高引导尺度的过饱和和伪影

Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models

摘要

Support