不相容扩散:通过噪声加速扩散训练分配
Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment
June 18, 2024
作者: Yiheng Li, Heyang Jiang, Akio Kodaira, Masayoshi Tomizuka, Kurt Keutzer, Chenfeng Xu
cs.AI
摘要
在本文中,我们指出次优的噪声数据映射会导致扩散模型训练缓慢。在扩散训练期间,当前方法会在整个噪声空间中扩散每个图像,导致在噪声层的每个点上都混合了所有图像。我们强调,这种随机混合的噪声数据映射使得扩散模型中去噪函数的优化变得复杂。受物理学中不相溶现象的启发,我们提出了不相溶扩散,这是一种简单而有效的方法,用于改善随机混合的噪声数据映射。在物理学中,相容性可以根据各种分子间力而变化。因此,不相容性意味着分子源的混合是可区分的。受此启发,我们提出了一种分配-然后-扩散的训练策略。具体来说,在将图像数据扩散到噪声之前,我们通过在小批量中最小化总图像-噪声对距离来为图像数据分配扩散目标噪声。这种分配函数类似于外部力,用于分离图像的可扩散区域,从而减轻扩散训练中固有的困难。我们的方法非常简单,只需要一行代码来限制每个图像的可扩散区域,同时保留噪声的高斯分布。这确保每个图像只投影到附近的噪声。为了解决分配算法的高复杂性,我们采用了量化分配方法,将计算开销降低到可以忽略的水平。实验证明,我们的方法在CIFAR数据集上对一致性模型和DDIM实现了高达3倍的更快训练速度,在CelebA数据集上对一致性模型实现了高达1.3倍的更快速度。此外,我们对不相溶扩散进行了彻底分析,阐明了它如何提高扩散训练速度同时改善保真度。
English
In this paper, we point out suboptimal noise-data mapping leads to slow
training of diffusion models. During diffusion training, current methods
diffuse each image across the entire noise space, resulting in a mixture of all
images at every point in the noise layer. We emphasize that this random mixture
of noise-data mapping complicates the optimization of the denoising function in
diffusion models. Drawing inspiration from the immiscible phenomenon in
physics, we propose Immiscible Diffusion, a simple and effective method to
improve the random mixture of noise-data mapping. In physics, miscibility can
vary according to various intermolecular forces. Thus, immiscibility means that
the mixing of the molecular sources is distinguishable. Inspired by this, we
propose an assignment-then-diffusion training strategy. Specifically, prior to
diffusing the image data into noise, we assign diffusion target noise for the
image data by minimizing the total image-noise pair distance in a mini-batch.
The assignment functions analogously to external forces to separate the
diffuse-able areas of images, thus mitigating the inherent difficulties in
diffusion training. Our approach is remarkably simple, requiring only one line
of code to restrict the diffuse-able area for each image while preserving the
Gaussian distribution of noise. This ensures that each image is projected only
to nearby noise. To address the high complexity of the assignment algorithm, we
employ a quantized-assignment method to reduce the computational overhead to a
negligible level. Experiments demonstrate that our method achieve up to 3x
faster training for consistency models and DDIM on the CIFAR dataset, and up to
1.3x faster on CelebA datasets for consistency models. Besides, we conduct
thorough analysis about the Immiscible Diffusion, which sheds lights on how it
improves diffusion training speed while improving the fidelity.Summary
AI-Generated Summary