ChatPaper.aiChatPaper

加速擴散:重新思考 UNet 編碼器在擴散模型中的角色

Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models

December 15, 2023
作者: Senmao Li, Taihang Hu, Fahad Shahbaz Khan, Linxuan Li, Shiqi Yang, Yaxing Wang, Ming-Ming Cheng, Jian Yang
cs.AI

摘要

擴散模型中的一個關鍵組件是用於噪音預測的 UNet。雖然有幾項研究探討了 UNet 解碼器的基本特性,但其編碼器在很大程度上仍未被探索。在這項研究中,我們進行了對 UNet 編碼器的首次全面研究。我們通過實證分析編碼器特徵,並就其在推論過程中的變化提供了重要問題的見解。特別是,我們發現編碼器特徵變化較為平緩,而解碼器特徵在不同時間步之間存在顯著變化。這一發現啟發我們在某些相鄰時間步中省略編碼器,並循環重複使用先前時間步中的編碼器特徵供解碼器使用。基於這一觀察,我們引入了一種簡單而有效的編碼器傳播方案,以加速對各種任務的擴散抽樣。通過我們的傳播方案,我們能夠在某些相鄰時間步中並行執行解碼器。此外,我們引入了一種先前噪音注入方法,以改善生成圖像中的紋理細節。除了標準的文本到圖像任務外,我們還在其他任務上驗證了我們的方法:文本到視頻、個性化生成和參考引導生成。在不使用任何知識蒸餾技術的情況下,我們的方法將 Stable Diffusion(SD)和 DeepFloyd-IF 模型的抽樣速度分別提高了 41% 和 24%,同時保持高質量的生成性能。我們的程式碼可在 https://github.com/hutaiHang/Faster-Diffusion{FasterDiffusion} 中找到。
English
One of the key components within diffusion models is the UNet for noise prediction. While several works have explored basic properties of the UNet decoder, its encoder largely remains unexplored. In this work, we conduct the first comprehensive study of the UNet encoder. We empirically analyze the encoder features and provide insights to important questions regarding their changes at the inference process. In particular, we find that encoder features change gently, whereas the decoder features exhibit substantial variations across different time-steps. This finding inspired us to omit the encoder at certain adjacent time-steps and reuse cyclically the encoder features in the previous time-steps for the decoder. Further based on this observation, we introduce a simple yet effective encoder propagation scheme to accelerate the diffusion sampling for a diverse set of tasks. By benefiting from our propagation scheme, we are able to perform in parallel the decoder at certain adjacent time-steps. Additionally, we introduce a prior noise injection method to improve the texture details in the generated image. Besides the standard text-to-image task, we also validate our approach on other tasks: text-to-video, personalized generation and reference-guided generation. Without utilizing any knowledge distillation technique, our approach accelerates both the Stable Diffusion (SD) and the DeepFloyd-IF models sampling by 41% and 24% respectively, while maintaining high-quality generation performance. Our code is available in https://github.com/hutaiHang/Faster-Diffusion{FasterDiffusion}.
PDF161December 15, 2024