歸一化注意力引導:擴散模型的通用負向引導
Normalized Attention Guidance: Universal Negative Guidance for Diffusion Model
May 27, 2025
作者: Dar-Yen Chen, Hmrishav Bandyopadhyay, Kai Zou, Yi-Zhe Song
cs.AI
摘要
负向引导——明确抑制不期望的属性——仍然是扩散模型中的一个基本挑战,尤其是在少步采样机制中。虽然无分类器引导(CFG)在标准设置下表现良好,但在激进的采样步长压缩下,由于正向与负向分支的预测分歧,其效果会失效。我们提出了归一化注意力引导(NAG),这是一种高效、无需训练的机制,通过在注意力空间中进行基于L1归一化和优化的外推来实现。NAG在CFG失效的情况下恢复了有效的负向引导,同时保持了生成质量。与现有方法不同,NAG能够跨架构(如UNet、DiT)、采样机制(少步、多步)和模态(图像、视频)通用,作为一个通用插件,仅需极少的计算开销。通过大量实验,我们展示了在文本对齐(CLIP分数)、保真度(FID、PFID)以及人类感知质量(ImageReward)方面的一致提升。我们的消融研究验证了每个设计组件的有效性,而用户研究则证实了对NAG引导输出的显著偏好。作为一种无需重新训练的模型无关推理方法,NAG为所有现代扩散框架提供了轻松实现的负向引导——伪代码见附录!
English
Negative guidance -- explicitly suppressing unwanted attributes -- remains a
fundamental challenge in diffusion models, particularly in few-step sampling
regimes. While Classifier-Free Guidance (CFG) works well in standard settings,
it fails under aggressive sampling step compression due to divergent
predictions between positive and negative branches. We present Normalized
Attention Guidance (NAG), an efficient, training-free mechanism that applies
extrapolation in attention space with L1-based normalization and refinement.
NAG restores effective negative guidance where CFG collapses while maintaining
fidelity. Unlike existing approaches, NAG generalizes across architectures
(UNet, DiT), sampling regimes (few-step, multi-step), and modalities (image,
video), functioning as a universal plug-in with minimal computational
overhead. Through extensive experimentation, we demonstrate consistent
improvements in text alignment (CLIP Score), fidelity (FID, PFID), and
human-perceived quality (ImageReward). Our ablation studies validate each
design component, while user studies confirm significant preference for
NAG-guided outputs. As a model-agnostic inference-time approach requiring no
retraining, NAG provides effortless negative guidance for all modern diffusion
frameworks -- pseudocode in the Appendix!