ScaleLong: 스케일링 네트워크 롱 스킵 연결을 통한 확산 모델의 더 안정적인 학습 방향

초록

확산 모델에서 UNet은 가장 널리 사용되는 네트워크 백본입니다. 이는 멀리 떨어진 네트워크 블록을 연결하는 장거리 스킵 연결(LSCs)이 장거리 정보를 집계하고 기울기 소실 문제를 완화할 수 있기 때문입니다. 그러나 UNet은 종종 확산 모델에서 불안정한 학습 문제를 겪는데, 이는 LSC 계수를 작게 스케일링함으로써 완화될 수 있습니다. 하지만 확산 모델에서 UNet의 불안정성에 대한 이론적 이해와 LSC 스케일링의 성능 향상에 대한 연구는 아직 부족합니다. 이 문제를 해결하기 위해, 우리는 UNet의 LSC 계수가 순방향 및 역방향 전파의 안정성과 UNet의 견고성에 큰 영향을 미친다는 것을 이론적으로 보여줍니다. 구체적으로, UNet의 어떤 층에서든 은닉 특징과 기울기가 진동할 수 있으며, 이 진동 범위가 실제로 크다는 것을 설명함으로써 UNet 학습의 불안정성을 설명합니다. 또한, UNet은 교란된 입력에 민감하며, 원하는 출력과 멀리 떨어진 출력을 예측하여 진동하는 손실과 진동하는 기울기를 발생시킵니다. 더불어, 우리는 LSC 계수 스케일링이 은닉 특징과 기울기의 안정성, 그리고 견고성에 미치는 이론적 이점도 관찰했습니다. 마지막으로, 우리의 이론에 영감을 받아 UNet의 LSC 계수를 스케일링하고 UNet의 학습 안정성을 더욱 개선하는 효과적인 계수 스케일링 프레임워크인 ScaleLong을 제안합니다. 네 가지 유명한 데이터셋에 대한 실험 결과는 우리의 방법이 학습 안정화에 우수하며, UNet 또는 UViT 백본을 사용한 다양한 확산 모델에서 약 1.5배의 학습 가속을 달성함을 보여줍니다. 코드: https://github.com/sail-sg/ScaleLong

English

In diffusion models, UNet is the most popular network backbone, since its long skip connects (LSCs) to connect distant network blocks can aggregate long-distant information and alleviate vanishing gradient. Unfortunately, UNet often suffers from unstable training in diffusion models which can be alleviated by scaling its LSC coefficients smaller. However, theoretical understandings of the instability of UNet in diffusion models and also the performance improvement of LSC scaling remain absent yet. To solve this issue, we theoretically show that the coefficients of LSCs in UNet have big effects on the stableness of the forward and backward propagation and robustness of UNet. Specifically, the hidden feature and gradient of UNet at any layer can oscillate and their oscillation ranges are actually large which explains the instability of UNet training. Moreover, UNet is also provably sensitive to perturbed input, and predicts an output distant from the desired output, yielding oscillatory loss and thus oscillatory gradient. Besides, we also observe the theoretical benefits of the LSC coefficient scaling of UNet in the stableness of hidden features and gradient and also robustness. Finally, inspired by our theory, we propose an effective coefficient scaling framework ScaleLong that scales the coefficients of LSC in UNet and better improves the training stability of UNet. Experimental results on four famous datasets show that our methods are superior to stabilize training and yield about 1.5x training acceleration on different diffusion models with UNet or UViT backbones. Code: https://github.com/sail-sg/ScaleLong

ScaleLong: 스케일링 네트워크 롱 스킵 연결을 통한 확산 모델의 더 안정적인 학습 방향

ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection

초록

Support