무조건적 사전 분포가 중요하다! 미세 조정된 확산 모델의 조건부 생성 성능 향상

초록

Classifier-Free Guidance(CFG)는 조건부 확산 모델을 훈련시키는 데 있어 기본적인 기술입니다. CFG 기반 훈련의 일반적인 관행은 조건부 및 무조건부 노이즈 예측을 모두 학습하기 위해 단일 네트워크를 사용하고, 조건화를 위해 낮은 드롭아웃 비율을 적용하는 것입니다. 그러나 우리는 훈련 과정에서 제한된 대역폭을 가진 무조건부 노이즈의 병행 학습이 무조건부 경우에 대해 열악한 사전 분포를 초래한다는 것을 관찰했습니다. 더 중요한 것은, 이러한 열악한 무조건부 노이즈 예측이 조건부 생성의 품질 저하의 심각한 원인이 된다는 점입니다. 대부분의 CFG 기반 조건부 모델이 더 나은 무조건부 생성을 위한 기본 모델을 미세 조정하여 훈련된다는 사실에 영감을 받아, 우리는 먼저 CFG에서의 무조건부 노이즈를 기본 모델이 예측한 노이즈로 단순히 대체하는 것만으로도 조건부 생성을 크게 개선할 수 있음을 보여줍니다. 더 나아가, 미세 조정된 모델이 훈련된 것과 다른 확산 모델을 무조건부 노이즈 대체에 사용할 수 있음을 보여줍니다. 우리는 이미지 및 비디오 생성을 위한 Zero-1-to-3, Versatile Diffusion, DiT, DynamiCrafter, InstructPix2Pix 등 다양한 CFG 기반 조건부 모델을 통해 우리의 주장을 실험적으로 검증합니다.

English

Classifier-Free Guidance (CFG) is a fundamental technique in training conditional diffusion models. The common practice for CFG-based training is to use a single network to learn both conditional and unconditional noise prediction, with a small dropout rate for conditioning. However, we observe that the joint learning of unconditional noise with limited bandwidth in training results in poor priors for the unconditional case. More importantly, these poor unconditional noise predictions become a serious reason for degrading the quality of conditional generation. Inspired by the fact that most CFG-based conditional models are trained by fine-tuning a base model with better unconditional generation, we first show that simply replacing the unconditional noise in CFG with that predicted by the base model can significantly improve conditional generation. Furthermore, we show that a diffusion model other than the one the fine-tuned model was trained on can be used for unconditional noise replacement. We experimentally verify our claim with a range of CFG-based conditional models for both image and video generation, including Zero-1-to-3, Versatile Diffusion, DiT, DynamiCrafter, and InstructPix2Pix.

무조건적 사전 분포가 중요하다! 미세 조정된 확산 모델의 조건부 생성 성능 향상

Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models

초록

Support