TAG:用于抗幻觉扩散采样的切向放大引导
TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling
October 6, 2025
作者: Hyunmin Cho, Donghoon Ahn, Susung Hong, Jee Eun Kim, Seungryong Kim, Kyong Hwan Jin
cs.AI
摘要
近期,扩散模型在图像生成领域取得了顶尖性能,但常常面临语义不一致或幻觉问题。尽管多种推理时引导方法能够提升生成质量,它们通常依赖外部信号或架构修改间接操作,这引入了额外的计算开销。本文提出了一种更为高效且直接的引导方法——切向放大引导(TAG),该方法仅基于轨迹信号运作,无需改动底层扩散模型。TAG利用中间样本作为投影基础,并放大估计得分相对于该基础的切向分量,以校正采样轨迹。我们通过一阶泰勒展开形式化这一引导过程,证明放大切向分量能够引导状态向更高概率区域移动,从而减少不一致性并提升样本质量。TAG作为一种即插即用、架构无关的模块,以极小的计算代价提高了扩散采样的保真度,为扩散引导提供了新的视角。
English
Recent diffusion models achieve the state-of-the-art performance in image
generation, but often suffer from semantic inconsistencies or hallucinations.
While various inference-time guidance methods can enhance generation, they
often operate indirectly by relying on external signals or architectural
modifications, which introduces additional computational overhead. In this
paper, we propose Tangential Amplifying Guidance (TAG), a more efficient and
direct guidance method that operates solely on trajectory signals without
modifying the underlying diffusion model. TAG leverages an intermediate sample
as a projection basis and amplifies the tangential components of the estimated
scores with respect to this basis to correct the sampling trajectory. We
formalize this guidance process by leveraging a first-order Taylor expansion,
which demonstrates that amplifying the tangential component steers the state
toward higher-probability regions, thereby reducing inconsistencies and
enhancing sample quality. TAG is a plug-and-play, architecture-agnostic module
that improves diffusion sampling fidelity with minimal computational addition,
offering a new perspective on diffusion guidance.