ScaleLong: スケーリングネットワークのロングスキップ接続による拡散モデルのより安定したトレーニングに向けて

要旨

拡散モデルにおいて、UNetは最も一般的なネットワークバックボーンであり、遠く離れたネットワークブロックを接続する長距離スキップ接続（LSC）により、長距離情報を集約し勾配消失を緩和することができます。しかし、UNetは拡散モデルにおいて不安定なトレーニングに悩まされることが多く、これはLSC係数を小さくスケーリングすることで緩和できます。ただし、拡散モデルにおけるUNetの不安定性やLSCスケーリングの性能向上に関する理論的理解はまだありません。この問題を解決するため、我々は理論的に、UNetのLSC係数が順伝播と逆伝播の安定性、およびUNetのロバスト性に大きな影響を与えることを示します。具体的には、UNetの隠れ層特徴量と勾配は任意の層で振動し、その振動範囲は実際に大きく、これがUNetのトレーニング不安定性を説明します。さらに、UNetは摂動入力に対して敏感であり、望ましい出力から遠い出力を予測し、振動する損失と勾配を生み出します。また、LSC係数スケーリングの理論的利点として、隠れ層特徴量と勾配の安定性、およびロバスト性の向上も観察されます。最後に、我々の理論に基づき、UNetのLSC係数をスケーリングし、トレーニング安定性を向上させる効果的な係数スケーリングフレームワークScaleLongを提案します。4つの有名なデータセットでの実験結果は、我々の手法がトレーニングを安定化し、UNetまたはUViTバックボーンを持つ異なる拡散モデルで約1.5倍のトレーニング加速を達成することを示しています。コード: https://github.com/sail-sg/ScaleLong

English

In diffusion models, UNet is the most popular network backbone, since its long skip connects (LSCs) to connect distant network blocks can aggregate long-distant information and alleviate vanishing gradient. Unfortunately, UNet often suffers from unstable training in diffusion models which can be alleviated by scaling its LSC coefficients smaller. However, theoretical understandings of the instability of UNet in diffusion models and also the performance improvement of LSC scaling remain absent yet. To solve this issue, we theoretically show that the coefficients of LSCs in UNet have big effects on the stableness of the forward and backward propagation and robustness of UNet. Specifically, the hidden feature and gradient of UNet at any layer can oscillate and their oscillation ranges are actually large which explains the instability of UNet training. Moreover, UNet is also provably sensitive to perturbed input, and predicts an output distant from the desired output, yielding oscillatory loss and thus oscillatory gradient. Besides, we also observe the theoretical benefits of the LSC coefficient scaling of UNet in the stableness of hidden features and gradient and also robustness. Finally, inspired by our theory, we propose an effective coefficient scaling framework ScaleLong that scales the coefficients of LSC in UNet and better improves the training stability of UNet. Experimental results on four famous datasets show that our methods are superior to stabilize training and yield about 1.5x training acceleration on different diffusion models with UNet or UViT backbones. Code: https://github.com/sail-sg/ScaleLong

ScaleLong: スケーリングネットワークのロングスキップ接続による拡散モデルのより安定したトレーニングに向けて

ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection

要旨

Support