ScaleLong:通過網絡長跳連接的縮放,實現擴散模型訓練更穩定。
ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection
October 20, 2023
作者: Zhongzhan Huang, Pan Zhou, Shuicheng Yan, Liang Lin
cs.AI
摘要
在擴散模型中,UNet是最受歡迎的網絡骨幹,因為其長跳連接(LSCs)可以連接遠程網絡塊,從而聚合遠程信息並緩解消失梯度問題。不幸的是,UNet在擴散模型中經常遭受不穩定訓練的困擾,可以通過將其LSC係數縮小來緩解。然而,對於UNet在擴散模型中的不穩定性以及LSC係數縮放對性能改善的理論理解尚未出現。為了解決這個問題,我們在理論上展示了UNet中LSC的係數對前向和後向傳播的穩定性以及UNet的韌性有著重大影響。具體來說,UNet在任何層的隱藏特徵和梯度可以振盪,其振盪範圍實際上很大,這解釋了UNet訓練的不穩定性。此外,UNet對干擾輸入也具有明顯敏感性,並預測與期望輸出相距甚遠,導致振盪損失和振盪梯度。此外,我們還觀察到UNet中LSC係數縮放在隱藏特徵和梯度的穩定性以及韌性方面的理論好處。最後,受我們理論的啟發,我們提出了一個有效的係數縮放框架ScaleLong,該框架調整UNet中LSC的係數,並更好地改善UNet的訓練穩定性。對四個知名數據集的實驗結果表明,我們的方法優於穩定訓練,並在具有UNet或UViT骨幹的不同擴散模型上實現約1.5倍的訓練加速。代碼:https://github.com/sail-sg/ScaleLong
English
In diffusion models, UNet is the most popular network backbone, since its
long skip connects (LSCs) to connect distant network blocks can aggregate
long-distant information and alleviate vanishing gradient. Unfortunately, UNet
often suffers from unstable training in diffusion models which can be
alleviated by scaling its LSC coefficients smaller. However, theoretical
understandings of the instability of UNet in diffusion models and also the
performance improvement of LSC scaling remain absent yet. To solve this issue,
we theoretically show that the coefficients of LSCs in UNet have big effects on
the stableness of the forward and backward propagation and robustness of UNet.
Specifically, the hidden feature and gradient of UNet at any layer can
oscillate and their oscillation ranges are actually large which explains the
instability of UNet training. Moreover, UNet is also provably sensitive to
perturbed input, and predicts an output distant from the desired output,
yielding oscillatory loss and thus oscillatory gradient. Besides, we also
observe the theoretical benefits of the LSC coefficient scaling of UNet in the
stableness of hidden features and gradient and also robustness. Finally,
inspired by our theory, we propose an effective coefficient scaling framework
ScaleLong that scales the coefficients of LSC in UNet and better improves the
training stability of UNet. Experimental results on four famous datasets show
that our methods are superior to stabilize training and yield about 1.5x
training acceleration on different diffusion models with UNet or UViT
backbones. Code: https://github.com/sail-sg/ScaleLong