Uni-Instruct: 통합 확산 분기 지침을 통한 단일 단계 확산 모델

초록

본 논문에서는 Diff-Instruct, DMD, SIM, SiD, f-distill 등 10개 이상의 기존 단일 단계 확산 증류(diffusion distillation) 접근법을 통합한 이론 기반 프레임워크인 \emph{Uni-Instruct}를 제안한다. Uni-Instruct는 우리가 제안한 f-발산(f-divergence) 계열의 확산 확장 이론에서 영감을 받아 개발되었다. 이후, 우리는 원래 확장된 f-발산의 계산 불가능성(intractability) 문제를 해결하는 핵심 이론을 소개하며, 이를 통해 확장된 f-발산 계열을 최소화함으로써 단일 단계 확산 모델을 효과적으로 학습시키는 동등하면서도 계산 가능한 손실 함수를 도출한다. Uni-Instruct가 제시하는 이러한 새로운 통합은 기존 접근법을 상위 수준에서 이해하는 데 도움을 주는 이론적 기여를 제공할 뿐만 아니라, 최첨단 단일 단계 확산 생성 성능을 이끌어낸다. CIFAR10 생성 벤치마크에서 Uni-Instruct는 무조건 생성에서 \emph{1.46}, 조건부 생성에서 \emph{1.38}이라는 기록적인 프레셰 시작 거리(Frechet Inception Distance, FID) 값을 달성했다. ImageNet-64×64 생성 벤치마크에서는 단일 단계 생성에서 \emph{1.02}라는 새로운 최첨단 FID를 달성하며, 이는 79단계 교사 확산 모델의 FID인 2.35를 크게 개선한 결과이다(1.02 대 2.35). 또한, Uni-Instruct를 텍스트-3D 생성과 같은 더 넓은 작업에 적용했다. 텍스트-3D 생성에서 Uni-Instruct는 생성 품질과 다양성 모두에서 SDS 및 VSD와 같은 기존 방법을 약간 능가하는 우수한 결과를 보여주었다. Uni-Instruct의 견고한 이론적 및 실증적 기여는 단일 단계 확산 증류 및 확산 모델의 지식 전달에 대한 향후 연구에 도움이 될 것으로 기대된다.

English

In this paper, we unify more than 10 existing one-step diffusion distillation approaches, such as Diff-Instruct, DMD, SIM, SiD, f-distill, etc, inside a theory-driven framework which we name the \emph{Uni-Instruct}. Uni-Instruct is motivated by our proposed diffusion expansion theory of the f-divergence family. Then we introduce key theories that overcome the intractability issue of the original expanded f-divergence, resulting in an equivalent yet tractable loss that effectively trains one-step diffusion models by minimizing the expanded f-divergence family. The novel unification introduced by Uni-Instruct not only offers new theoretical contributions that help understand existing approaches from a high-level perspective but also leads to state-of-the-art one-step diffusion generation performances. On the CIFAR10 generation benchmark, Uni-Instruct achieves record-breaking Frechet Inception Distance (FID) values of \emph{1.46} for unconditional generation and \emph{1.38} for conditional generation. On the ImageNet-64times 64 generation benchmark, Uni-Instruct achieves a new SoTA one-step generation FID of \emph{1.02}, which outperforms its 79-step teacher diffusion with a significant improvement margin of 1.33 (1.02 vs 2.35). We also apply Uni-Instruct on broader tasks like text-to-3D generation. For text-to-3D generation, Uni-Instruct gives decent results, which slightly outperforms previous methods, such as SDS and VSD, in terms of both generation quality and diversity. Both the solid theoretical and empirical contributions of Uni-Instruct will potentially help future studies on one-step diffusion distillation and knowledge transferring of diffusion models.

Uni-Instruct: 통합 확산 분기 지침을 통한 단일 단계 확산 모델

Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction

초록

Support