TAG:切線放大導引技術——抗幻覺擴散採樣方法
TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling
October 6, 2025
作者: Hyunmin Cho, Donghoon Ahn, Susung Hong, Jee Eun Kim, Seungryong Kim, Kyong Hwan Jin
cs.AI
摘要
近期,擴散模型在圖像生成領域取得了頂尖的性能,但常常面臨語義不一致或幻覺問題。儘管各種推理時引導方法能夠提升生成效果,但它們通常依賴外部信號或架構修改間接運作,這引入了額外的計算開銷。本文提出切向放大引導(Tangential Amplifying Guidance, TAG),這是一種更高效且直接的引導方法,僅基於軌跡信號運作,無需修改底層擴散模型。TAG利用中間樣本作為投影基,並放大估計分數相對於該基的切向分量,以校正採樣軌跡。我們通過一階泰勒展開形式化這一引導過程,證明放大切向分量能將狀態引導至更高概率區域,從而減少不一致性並提升樣本質量。TAG是一個即插即用、與架構無關的模塊,以最小的計算代價提升擴散採樣保真度,為擴散引導提供了新的視角。
English
Recent diffusion models achieve the state-of-the-art performance in image
generation, but often suffer from semantic inconsistencies or hallucinations.
While various inference-time guidance methods can enhance generation, they
often operate indirectly by relying on external signals or architectural
modifications, which introduces additional computational overhead. In this
paper, we propose Tangential Amplifying Guidance (TAG), a more efficient and
direct guidance method that operates solely on trajectory signals without
modifying the underlying diffusion model. TAG leverages an intermediate sample
as a projection basis and amplifies the tangential components of the estimated
scores with respect to this basis to correct the sampling trajectory. We
formalize this guidance process by leveraging a first-order Taylor expansion,
which demonstrates that amplifying the tangential component steers the state
toward higher-probability regions, thereby reducing inconsistencies and
enhancing sample quality. TAG is a plug-and-play, architecture-agnostic module
that improves diffusion sampling fidelity with minimal computational addition,
offering a new perspective on diffusion guidance.