ScaleLong：通过缩放网络长跳连接实现扩散模型训练的更稳定化

ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection

October 20, 2023

作者: Zhongzhan Huang, Pan Zhou, Shuicheng Yan, Liang Lin

cs.AI

摘要

在扩散模型中，UNet是最流行的网络骨干，因为其长跳连接（LSCs）能够连接远距离的网络块，从而聚合远距离信息并缓解梯度消失问题。不幸的是，UNet在扩散模型中经常遭受不稳定训练的困扰，可以通过缩小其LSC系数来缓解这一问题。然而，目前尚缺乏关于UNet在扩散模型中不稳定性以及LSC缩放对性能改善的理论理解。为了解决这个问题，我们从理论上证明了UNet中LSC的系数对前向和反向传播的稳定性以及UNet的鲁棒性有很大影响。具体来说，UNet在任何层的隐藏特征和梯度可以振荡，其振荡范围实际上很大，这解释了UNet训练的不稳定性。此外，UNet对扰动输入也敏感，并预测与期望输出相距甚远，导致振荡损失和振荡梯度。此外，我们还观察到UNet的LSC系数缩放在隐藏特征和梯度的稳定性以及鲁棒性方面的理论优势。最后，受到我们理论的启发，我们提出了一个有效的系数缩放框架ScaleLong，该框架调整UNet中LSC的系数，并更好地改善UNet的训练稳定性。在四个著名数据集上的实验结果表明，我们的方法优于稳定训练，并在具有UNet或UViT骨干的不同扩散模型上实现了约1.5倍的训练加速。源代码：https://github.com/sail-sg/ScaleLong

English

In diffusion models, UNet is the most popular network backbone, since its long skip connects (LSCs) to connect distant network blocks can aggregate long-distant information and alleviate vanishing gradient. Unfortunately, UNet often suffers from unstable training in diffusion models which can be alleviated by scaling its LSC coefficients smaller. However, theoretical understandings of the instability of UNet in diffusion models and also the performance improvement of LSC scaling remain absent yet. To solve this issue, we theoretically show that the coefficients of LSCs in UNet have big effects on the stableness of the forward and backward propagation and robustness of UNet. Specifically, the hidden feature and gradient of UNet at any layer can oscillate and their oscillation ranges are actually large which explains the instability of UNet training. Moreover, UNet is also provably sensitive to perturbed input, and predicts an output distant from the desired output, yielding oscillatory loss and thus oscillatory gradient. Besides, we also observe the theoretical benefits of the LSC coefficient scaling of UNet in the stableness of hidden features and gradient and also robustness. Finally, inspired by our theory, we propose an effective coefficient scaling framework ScaleLong that scales the coefficients of LSC in UNet and better improves the training stability of UNet. Experimental results on four famous datasets show that our methods are superior to stabilize training and yield about 1.5x training acceleration on different diffusion models with UNet or UViT backbones. Code: https://github.com/sail-sg/ScaleLong