分析和改善扩散模型的训练动态。

摘要

扩散模型目前在数据驱动图像合成领域占据主导地位，其在大规模数据集上的无与伦比的扩展能力。本文中，我们确定并纠正了流行的ADM扩散模型架构中导致训练不均匀且低效的几个原因，同时没有改变其高级结构。观察到在训练过程中网络激活和权重中出现的不受控制的幅度变化和不平衡，我们重新设计了网络层，以保持期望中的激活、权重和更新幅度。我们发现系统地应用这一理念可以消除观察到的漂移和不平衡，从而在相同的计算复杂度下获得更好的网络。我们的修改将在ImageNet-512合成中的先前记录FID从2.41提高到1.81，采用快速确定性采样实现。作为独立贡献，我们提出了一种方法来在训练运行完成后设置指数移动平均（EMA）参数，即事后。这允许精确调整EMA长度，而无需进行多次训练运行，同时揭示了EMA与网络架构、训练时间和指导之间的惊人互动。

English

Diffusion models currently dominate the field of data-driven image synthesis with their unparalleled scaling to large datasets. In this paper, we identify and rectify several causes for uneven and ineffective training in the popular ADM diffusion model architecture, without altering its high-level structure. Observing uncontrolled magnitude changes and imbalances in both the network activations and weights over the course of training, we redesign the network layers to preserve activation, weight, and update magnitudes on expectation. We find that systematic application of this philosophy eliminates the observed drifts and imbalances, resulting in considerably better networks at equal computational complexity. Our modifications improve the previous record FID of 2.41 in ImageNet-512 synthesis to 1.81, achieved using fast deterministic sampling. As an independent contribution, we present a method for setting the exponential moving average (EMA) parameters post-hoc, i.e., after completing the training run. This allows precise tuning of EMA length without the cost of performing several training runs, and reveals its surprising interactions with network architecture, training time, and guidance.

分析和改善扩散模型的训练动态。

Analyzing and Improving the Training Dynamics of Diffusion Models

摘要

Support