不相容擴散:利用噪音加速擴散訓練的分配
Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment
June 18, 2024
作者: Yiheng Li, Heyang Jiang, Akio Kodaira, Masayoshi Tomizuka, Kurt Keutzer, Chenfeng Xu
cs.AI
摘要
本文指出次優的噪音數據映射導致擴散模型訓練緩慢。在擴散訓練期間,當前方法將每個影像擴散到整個噪音空間,導致噪音層的每一點都混合了所有影像。我們強調這種隨機混合的噪音數據映射使得擴散模型中去噪函數的優化變得複雜。受物理學中不相溶現象的啟發,我們提出了不相溶擴散,這是一種改善隨機混合的噪音數據映射的簡單有效方法。在物理學中,溶解性可以根據不同的分子間力而變化。因此,不相溶意味著分子源的混合是可區分的。受此啟發,我們提出了一種分配-然後-擴散的訓練策略。具體來說,在將圖像數據擴散到噪音之前,我們通過最小化小批次中的總圖像-噪音對距離來為圖像數據分配擴散目標噪音。該分配函數類比於外部力以區分圖像的可擴散區域,從而減輕擴散訓練中固有的困難。我們的方法非常簡單,僅需要一行代碼來限制每個圖像的可擴散區域,同時保留噪音的高斯分佈。這確保每個圖像只投影到附近的噪音。為了應對分配算法的高復雜性,我們採用了量化分配方法,將計算開銷降低到可以忽略的水平。實驗表明,我們的方法在CIFAR數據集上對一致性模型和DDIM實現了高達3倍的更快訓練速度,在CelebA數據集上對一致性模型實現了高達1.3倍的更快速度。此外,我們對不相溶擴散進行了深入分析,闡明了它如何提高擴散訓練速度同時改善保真度。
English
In this paper, we point out suboptimal noise-data mapping leads to slow
training of diffusion models. During diffusion training, current methods
diffuse each image across the entire noise space, resulting in a mixture of all
images at every point in the noise layer. We emphasize that this random mixture
of noise-data mapping complicates the optimization of the denoising function in
diffusion models. Drawing inspiration from the immiscible phenomenon in
physics, we propose Immiscible Diffusion, a simple and effective method to
improve the random mixture of noise-data mapping. In physics, miscibility can
vary according to various intermolecular forces. Thus, immiscibility means that
the mixing of the molecular sources is distinguishable. Inspired by this, we
propose an assignment-then-diffusion training strategy. Specifically, prior to
diffusing the image data into noise, we assign diffusion target noise for the
image data by minimizing the total image-noise pair distance in a mini-batch.
The assignment functions analogously to external forces to separate the
diffuse-able areas of images, thus mitigating the inherent difficulties in
diffusion training. Our approach is remarkably simple, requiring only one line
of code to restrict the diffuse-able area for each image while preserving the
Gaussian distribution of noise. This ensures that each image is projected only
to nearby noise. To address the high complexity of the assignment algorithm, we
employ a quantized-assignment method to reduce the computational overhead to a
negligible level. Experiments demonstrate that our method achieve up to 3x
faster training for consistency models and DDIM on the CIFAR dataset, and up to
1.3x faster on CelebA datasets for consistency models. Besides, we conduct
thorough analysis about the Immiscible Diffusion, which sheds lights on how it
improves diffusion training speed while improving the fidelity.Summary
AI-Generated Summary