Bedingte Diffusionsdestillation

papers.abstract

Generative Diffusionsmodelle bieten starke A-priori-Informationen für die Text-zu-Bild-Generierung und dienen somit als Grundlage für bedingte Generierungsaufgaben wie Bildbearbeitung, Restaurierung und Super-Resolution. Eine wesentliche Einschränkung von Diffusionsmodellen ist jedoch ihre langsame Abtastzeit. Um diese Herausforderung zu bewältigen, präsentieren wir eine neuartige Methode zur bedingten Destillation, die darauf abzielt, die Diffusions-A-priori-Informationen mithilfe von Bildbedingungen zu ergänzen und so eine bedingte Abtastung mit sehr wenigen Schritten zu ermöglichen. Wir destillieren das unbedingte Vortraining direkt in einer einzigen Stufe durch gemeinsames Lernen, wodurch die bisherigen zweistufigen Verfahren, die sowohl Destillation als auch bedingte Feinabstimmung separat umfassen, erheblich vereinfacht werden. Darüber hinaus ermöglicht unsere Methode einen neuen parameter-effizienten Destillationsmechanismus, der jede Aufgabe mit nur einer geringen Anzahl zusätzlicher Parameter in Kombination mit dem gemeinsam genutzten, eingefrorenen unbedingten Backbone destilliert. Experimente über mehrere Aufgaben hinweg, einschließlich Super-Resolution, Bildbearbeitung und Tiefen-zu-Bild-Generierung, zeigen, dass unsere Methode bestehende Destillationstechniken bei gleicher Abtastzeit übertrifft. Bemerkenswerterweise ist unsere Methode die erste Destillationsstrategie, die die Leistung der deutlich langsameren feinabgestimmten bedingten Diffusionsmodelle erreichen kann.

English

Generative diffusion models provide strong priors for text-to-image generation and thereby serve as a foundation for conditional generation tasks such as image editing, restoration, and super-resolution. However, one major limitation of diffusion models is their slow sampling time. To address this challenge, we present a novel conditional distillation method designed to supplement the diffusion priors with the help of image conditions, allowing for conditional sampling with very few steps. We directly distill the unconditional pre-training in a single stage through joint-learning, largely simplifying the previous two-stage procedures that involve both distillation and conditional finetuning separately. Furthermore, our method enables a new parameter-efficient distillation mechanism that distills each task with only a small number of additional parameters combined with the shared frozen unconditional backbone. Experiments across multiple tasks including super-resolution, image editing, and depth-to-image generation demonstrate that our method outperforms existing distillation techniques for the same sampling time. Notably, our method is the first distillation strategy that can match the performance of the much slower fine-tuned conditional diffusion models.

Bedingte Diffusionsdestillation

Conditional Diffusion Distillation

papers.abstract

Support