Uni-Instruct: Eénstaps Diffusiemodel via Geünificeerde Diffusie Divergentie-instructie

Samenvatting

In dit artikel verenigen we meer dan 10 bestaande één-staps diffusiedistillatiebenaderingen, zoals Diff-Instruct, DMD, SIM, SiD, f-distill, etc., binnen een theoriegedreven raamwerk dat we \emph{Uni-Instruct} noemen. Uni-Instruct is geïnspireerd door onze voorgestelde diffusie-uitbreidingstheorie van de f-divergentiefamilie. Vervolgens introduceren we sleuteltheorieën die de onhanteerbaarheidskwestie van de originele uitgebreide f-divergentie overwinnen, wat resulteert in een equivalente maar hanteerbare verliesfunctie die één-staps diffusiemodellen effectief traint door de uitgebreide f-divergentiefamilie te minimaliseren. De nieuwe unificatie die door Uni-Instruct wordt geïntroduceerd, biedt niet alleen nieuwe theoretische bijdragen die helpen om bestaande benaderingen vanuit een hoog niveau te begrijpen, maar leidt ook tot state-of-the-art prestaties in één-staps diffusiegeneratie. Op de CIFAR10-generatiebenchmark behaalt Uni-Instruct recordbrekende Frechet Inception Distance (FID)-waarden van \emph{1.46} voor onvoorwaardelijke generatie en \emph{1.38} voor voorwaardelijke generatie. Op de ImageNet-64x64-generatiebenchmark behaalt Uni-Instruct een nieuwe SoTA één-staps generatie-FID van \emph{1.02}, wat zijn 79-staps leraar-diffusie overtreft met een significante verbeteringsmarge van 1.33 (1.02 vs 2.35). We passen Uni-Instruct ook toe op bredere taken zoals tekst-naar-3D-generatie. Voor tekst-naar-3D-generatie geeft Uni-Instruct behoorlijke resultaten, die eerdere methoden, zoals SDS en VSD, lichtjes overtreffen in termen van zowel generatiekwaliteit als diversiteit. Zowel de solide theoretische als empirische bijdragen van Uni-Instruct zullen mogelijk toekomstige studies over één-staps diffusiedistillatie en kennisoverdracht van diffusiemodellen helpen.

English

In this paper, we unify more than 10 existing one-step diffusion distillation approaches, such as Diff-Instruct, DMD, SIM, SiD, f-distill, etc, inside a theory-driven framework which we name the \emph{Uni-Instruct}. Uni-Instruct is motivated by our proposed diffusion expansion theory of the f-divergence family. Then we introduce key theories that overcome the intractability issue of the original expanded f-divergence, resulting in an equivalent yet tractable loss that effectively trains one-step diffusion models by minimizing the expanded f-divergence family. The novel unification introduced by Uni-Instruct not only offers new theoretical contributions that help understand existing approaches from a high-level perspective but also leads to state-of-the-art one-step diffusion generation performances. On the CIFAR10 generation benchmark, Uni-Instruct achieves record-breaking Frechet Inception Distance (FID) values of \emph{1.46} for unconditional generation and \emph{1.38} for conditional generation. On the ImageNet-64times 64 generation benchmark, Uni-Instruct achieves a new SoTA one-step generation FID of \emph{1.02}, which outperforms its 79-step teacher diffusion with a significant improvement margin of 1.33 (1.02 vs 2.35). We also apply Uni-Instruct on broader tasks like text-to-3D generation. For text-to-3D generation, Uni-Instruct gives decent results, which slightly outperforms previous methods, such as SDS and VSD, in terms of both generation quality and diversity. Both the solid theoretical and empirical contributions of Uni-Instruct will potentially help future studies on one-step diffusion distillation and knowledge transferring of diffusion models.

Uni-Instruct: Eénstaps Diffusiemodel via Geünificeerde Diffusie Divergentie-instructie

Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction

Samenvatting

Support