ChatPaper.aiChatPaper

迈向可靠扩散采样的前沿:基于对抗性Sinkhorn注意力引导的方法

Toward the Frontiers of Reliable Diffusion Sampling via Adversarial Sinkhorn Attention Guidance

November 10, 2025
作者: Kwanyoung Kim
cs.AI

摘要

扩散模型在使用分类器自由引导(CFG)等引导方法时展现出强大的生成能力,这类方法通过修改采样轨迹来提升输出质量。传统引导方法通常通过刻意劣化某个输出(如无条件输出)来增强目标输出,其采用启发式扰动函数(如恒等混合或模糊条件处理)。然而这些方法缺乏理论支撑,且依赖于人工设计的失真策略。本文提出对抗性Sinkhorn注意力引导(ASAG),这是一种通过最优传输理论重新解读扩散模型中注意力分值,并利用Sinkhorn算法主动干扰传输成本的新方法。ASAG并非简单破坏注意力机制,而是在自注意力层注入对抗性成本以降低查询向量与键向量的像素级相似度。这种刻意劣化能够削弱误导性的注意力对齐,从而提升条件生成与无条件生成的样本质量。ASAG在文生图扩散任务中展现出稳定改进,并在IP-Adapter、ControlNet等下游应用中增强了可控性与保真度。该方法具有轻量化、即插即用特性,无需模型重训练即可提升生成可靠性。
English
Diffusion models have demonstrated strong generative performance when using guidance methods such as classifier-free guidance (CFG), which enhance output quality by modifying the sampling trajectory. These methods typically improve a target output by intentionally degrading another, often the unconditional output, using heuristic perturbation functions such as identity mixing or blurred conditions. However, these approaches lack a principled foundation and rely on manually designed distortions. In this work, we propose Adversarial Sinkhorn Attention Guidance (ASAG), a novel method that reinterprets attention scores in diffusion models through the lens of optimal transport and intentionally disrupt the transport cost via Sinkhorn algorithm. Instead of naively corrupting the attention mechanism, ASAG injects an adversarial cost within self-attention layers to reduce pixel-wise similarity between queries and keys. This deliberate degradation weakens misleading attention alignments and leads to improved conditional and unconditional sample quality. ASAG shows consistent improvements in text-to-image diffusion, and enhances controllability and fidelity in downstream applications such as IP-Adapter and ControlNet. The method is lightweight, plug-and-play, and improves reliability without requiring any model retraining.
PDF52December 1, 2025