在扩散模型中消除高引导尺度的过饱和和伪影
Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models
October 3, 2024
作者: Seyedmorteza Sadat, Otmar Hilliges, Romann M. Weber
cs.AI
摘要
无分类器指导(CFG)对于改善扩散模型中生成质量和输入条件与最终输出之间的对齐至关重要。虽然通常需要较高的指导尺度来增强这些方面,但也会导致过饱和和不真实的伪影。在本文中,我们重新审视了CFG更新规则,并引入了修改以解决这一问题。我们首先将CFG中的更新项分解为与条件模型预测平行和正交的两个分量,并观察到平行分量主要导致过饱和,而正交分量则提高了图像质量。因此,我们提出减小平行分量的权重以实现高质量的生成而不过饱和。此外,我们将CFG与梯度上升之间建立联系,并基于这一见解引入了一种新的重新缩放和动量方法用于CFG更新规则。我们的方法,称为自适应投影指导(APG),保留了CFG的提高质量优势,同时使得可以在不过饱和的情况下使用更高的指导尺度。APG易于实现,并在采样过程中几乎不增加额外的计算负担。通过大量实验证明,APG与各种条件扩散模型和采样器兼容,导致改进的FID、召回率和饱和度分数,同时保持与CFG可比的精度,使我们的方法成为标准无分类器指导的卓越即插即用替代方案。
English
Classifier-free guidance (CFG) is crucial for improving both generation
quality and alignment between the input condition and final output in diffusion
models. While a high guidance scale is generally required to enhance these
aspects, it also causes oversaturation and unrealistic artifacts. In this
paper, we revisit the CFG update rule and introduce modifications to address
this issue. We first decompose the update term in CFG into parallel and
orthogonal components with respect to the conditional model prediction and
observe that the parallel component primarily causes oversaturation, while the
orthogonal component enhances image quality. Accordingly, we propose
down-weighting the parallel component to achieve high-quality generations
without oversaturation. Additionally, we draw a connection between CFG and
gradient ascent and introduce a new rescaling and momentum method for the CFG
update rule based on this insight. Our approach, termed adaptive projected
guidance (APG), retains the quality-boosting advantages of CFG while enabling
the use of higher guidance scales without oversaturation. APG is easy to
implement and introduces practically no additional computational overhead to
the sampling process. Through extensive experiments, we demonstrate that APG is
compatible with various conditional diffusion models and samplers, leading to
improved FID, recall, and saturation scores while maintaining precision
comparable to CFG, making our method a superior plug-and-play alternative to
standard classifier-free guidance.Summary
AI-Generated Summary