SenseFlow: 플로우 기반 텍스트-이미지 증류를 위한 분포 매칭 확장

초록

분포 매칭 증류(Distribution Matching Distillation, DMD)는 Stable Diffusion(SD) 1.5와 같은 텍스트-이미지 확산 모델에 성공적으로 적용되어 왔다. 그러나 기본 DMD는 SD 3.5 및 FLUX와 같은 대규모 흐름 기반 텍스트-이미지 모델에서 수렴 어려움을 겪는다. 본 논문에서는 먼저 대규모 모델에 기본 DMD를 적용할 때 발생하는 문제를 분석한다. 이후 확장성 문제를 극복하기 위해, 생성자와 가짜 분포 간의 거리를 규제하기 위해 암묵적 분포 정렬(Implicit Distribution Alignment, IDA)을 제안한다. 더 나아가, 교사 모델의 시간 단계 중요도 분포를 재배치하기 위해 세그먼트 내 지도(Intra-Segment Guidance, ISG)를 제안한다. IDA만 사용해도 DMD는 SD 3.5에서 수렴하며, IDA와 ISG를 모두 사용할 경우 DMD는 SD 3.5 및 FLUX.1 dev에서 수렴한다. 확장된 판별자 모델과 같은 다른 개선 사항과 함께, 우리의 최종 모델인 SenseFlow는 SDXL과 같은 확산 기반 텍스트-이미지 모델 및 SD 3.5 Large 및 FLUX와 같은 흐름 매칭 모델 모두에서 우수한 증류 성능을 달성한다. 소스 코드는 https://github.com/XingtongGe/SenseFlow에서 확인할 수 있다.

English

The Distribution Matching Distillation (DMD) has been successfully applied to text-to-image diffusion models such as Stable Diffusion (SD) 1.5. However, vanilla DMD suffers from convergence difficulties on large-scale flow-based text-to-image models, such as SD 3.5 and FLUX. In this paper, we first analyze the issues when applying vanilla DMD on large-scale models. Then, to overcome the scalability challenge, we propose implicit distribution alignment (IDA) to regularize the distance between the generator and fake distribution. Furthermore, we propose intra-segment guidance (ISG) to relocate the timestep importance distribution from the teacher model. With IDA alone, DMD converges for SD 3.5; employing both IDA and ISG, DMD converges for SD 3.5 and FLUX.1 dev. Along with other improvements such as scaled up discriminator models, our final model, dubbed SenseFlow, achieves superior performance in distillation for both diffusion based text-to-image models such as SDXL, and flow-matching models such as SD 3.5 Large and FLUX. The source code will be avaliable at https://github.com/XingtongGe/SenseFlow.

SenseFlow: 플로우 기반 텍스트-이미지 증류를 위한 분포 매칭 확장

SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation

초록

Support