归一化注意力引导:扩散模型的通用负向引导
Normalized Attention Guidance: Universal Negative Guidance for Diffusion Model
May 27, 2025
作者: Dar-Yen Chen, Hmrishav Bandyopadhyay, Kai Zou, Yi-Zhe Song
cs.AI
摘要
负向引导——明确抑制不期望的属性——仍然是扩散模型中的一个基本挑战,尤其是在少步采样场景下。尽管无分类器引导(CFG)在标准设置下表现良好,但在采样步数极度压缩的情况下,由于正负分支预测结果的分歧,其效果会失效。我们提出了归一化注意力引导(NAG),这是一种高效、无需训练的机制,通过在注意力空间应用基于L1范数的归一化和精细化外推。NAG在CFG失效时恢复了有效的负向引导,同时保持了生成质量。与现有方法不同,NAG能够跨架构(如UNet、DiT)、采样模式(少步、多步)和模态(图像、视频)通用,作为一个计算开销极小的通用插件。通过大量实验,我们展示了在文本对齐(CLIP分数)、保真度(FID、PFID)以及人类感知质量(ImageReward)方面的一致提升。我们的消融研究验证了每个设计组件的有效性,而用户研究则证实了对NAG引导输出的显著偏好。作为一种无需重新训练、模型无关的推理时方法,NAG为所有现代扩散框架提供了轻松的负向引导——附录中提供了伪代码!
English
Negative guidance -- explicitly suppressing unwanted attributes -- remains a
fundamental challenge in diffusion models, particularly in few-step sampling
regimes. While Classifier-Free Guidance (CFG) works well in standard settings,
it fails under aggressive sampling step compression due to divergent
predictions between positive and negative branches. We present Normalized
Attention Guidance (NAG), an efficient, training-free mechanism that applies
extrapolation in attention space with L1-based normalization and refinement.
NAG restores effective negative guidance where CFG collapses while maintaining
fidelity. Unlike existing approaches, NAG generalizes across architectures
(UNet, DiT), sampling regimes (few-step, multi-step), and modalities (image,
video), functioning as a universal plug-in with minimal computational
overhead. Through extensive experimentation, we demonstrate consistent
improvements in text alignment (CLIP Score), fidelity (FID, PFID), and
human-perceived quality (ImageReward). Our ablation studies validate each
design component, while user studies confirm significant preference for
NAG-guided outputs. As a model-agnostic inference-time approach requiring no
retraining, NAG provides effortless negative guidance for all modern diffusion
frameworks -- pseudocode in the Appendix!Summary
AI-Generated Summary