ChatPaper.aiChatPaper

無別名潛在擴散模型:提升擴散潛在空間的分數平移等變性

Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space

March 12, 2025
作者: Yifan Zhou, Zeqi Xiao, Shuai Yang, Xingang Pan
cs.AI

摘要

潛在擴散模型(LDMs)因其生成過程不穩定而聞名,即使輸入噪聲中的微小擾動或偏移也可能導致顯著不同的輸出。這限制了它們在需要一致結果的應用中的適用性。在本研究中,我們重新設計了LDMs,通過使其具有平移等變性來增強一致性。雖然引入抗鋸齒操作可以部分改善平移等變性,但由於LDMs中的獨特挑戰,包括1)在VAE訓練和多個U-Net推理過程中鋸齒效應的放大,以及2)本質上缺乏平移等變性的自注意力模塊,顯著的鋸齒效應和不一致性仍然存在。為了解決這些問題,我們重新設計了注意力模塊,使其具有平移等變性,並提出了一種等變性損失,有效抑制了連續域中特徵的頻帶寬度。由此產生的無鋸齒LDM(AF-LDM)實現了強大的平移等變性,並且對不規則變形也具有魯棒性。大量實驗表明,AF-LDM在各種應用中,包括視頻編輯和圖像到圖像轉換,比原始LDM產生了顯著更一致的結果。代碼可在以下網址獲取:https://github.com/SingleZombie/AFLDM。
English
Latent Diffusion Models (LDMs) are known to have an unstable generation process, where even small perturbations or shifts in the input noise can lead to significantly different outputs. This hinders their applicability in applications requiring consistent results. In this work, we redesign LDMs to enhance consistency by making them shift-equivariant. While introducing anti-aliasing operations can partially improve shift-equivariance, significant aliasing and inconsistency persist due to the unique challenges in LDMs, including 1) aliasing amplification during VAE training and multiple U-Net inferences, and 2) self-attention modules that inherently lack shift-equivariance. To address these issues, we redesign the attention modules to be shift-equivariant and propose an equivariance loss that effectively suppresses the frequency bandwidth of the features in the continuous domain. The resulting alias-free LDM (AF-LDM) achieves strong shift-equivariance and is also robust to irregular warping. Extensive experiments demonstrate that AF-LDM produces significantly more consistent results than vanilla LDM across various applications, including video editing and image-to-image translation. Code is available at: https://github.com/SingleZombie/AFLDM

Summary

AI-Generated Summary

PDF62March 13, 2025