조건부 확산 증류

초록

생성적 확산 모델은 텍스트-이미지 생성에 강력한 사전 지식을 제공하며, 이를 통해 이미지 편집, 복원, 초해상도와 같은 조건부 생성 작업의 기반으로 작용합니다. 그러나 확산 모델의 주요 한계점은 느린 샘플링 시간입니다. 이 문제를 해결하기 위해, 우리는 이미지 조건을 활용하여 확산 사전 지식을 보완하고, 매우 적은 단계로 조건부 샘플링을 가능하게 하는 새로운 조건부 증류 방법을 제안합니다. 우리는 무조건 사전 학습을 단일 단계에서 직접 공동 학습을 통해 증류함으로써, 기존의 증류와 조건부 미세 조정을 별도로 수행하는 두 단계 절차를 크게 단순화합니다. 더욱이, 우리의 방법은 공유된 고정된 무조건 백본과 결합된 소수의 추가 매개변수만으로 각 작업을 증류할 수 있는 새로운 매개변수 효율적 증류 메커니즘을 가능하게 합니다. 초해상도, 이미지 편집, 깊이-이미지 생성을 포함한 다양한 작업에 대한 실험 결과, 우리의 방법은 동일한 샘플링 시간에서 기존의 증류 기술을 능가하는 성능을 보여줍니다. 특히, 우리의 방법은 훨씬 느린 미세 조정된 조건부 확산 모델의 성능에 맞출 수 있는 최초의 증류 전략입니다.

English

Generative diffusion models provide strong priors for text-to-image generation and thereby serve as a foundation for conditional generation tasks such as image editing, restoration, and super-resolution. However, one major limitation of diffusion models is their slow sampling time. To address this challenge, we present a novel conditional distillation method designed to supplement the diffusion priors with the help of image conditions, allowing for conditional sampling with very few steps. We directly distill the unconditional pre-training in a single stage through joint-learning, largely simplifying the previous two-stage procedures that involve both distillation and conditional finetuning separately. Furthermore, our method enables a new parameter-efficient distillation mechanism that distills each task with only a small number of additional parameters combined with the shared frozen unconditional backbone. Experiments across multiple tasks including super-resolution, image editing, and depth-to-image generation demonstrate that our method outperforms existing distillation techniques for the same sampling time. Notably, our method is the first distillation strategy that can match the performance of the much slower fine-tuned conditional diffusion models.

조건부 확산 증류

Conditional Diffusion Distillation

초록

Support