SenseFlow：面向流式文本到图像蒸馏的分布式匹配扩展

摘要

分布匹配蒸馏（DMD）已成功应用于诸如Stable Diffusion（SD）1.5等文本到图像扩散模型。然而，原始DMD在处理大规模基于流的文本到图像模型（如SD 3.5和FLUX）时，面临收敛难题。本文首先分析了将原始DMD应用于大规模模型时存在的问题。随后，为克服可扩展性挑战，我们提出了隐式分布对齐（IDA），以规范生成器与伪造分布之间的距离。此外，我们引入了段内指导（ISG），以重新定位教师模型中的时间步重要性分布。仅采用IDA，DMD便能在SD 3.5上实现收敛；结合IDA与ISG，DMD在SD 3.5和FLUX.1开发版上均能收敛。加之其他改进措施，如扩大判别器模型规模，我们的最终模型——SenseFlow，在基于扩散的文本到图像模型（如SDXL）及流匹配模型（如SD 3.5 Large和FLUX）的蒸馏任务中均展现出卓越性能。源代码将发布于https://github.com/XingtongGe/SenseFlow。

English

The Distribution Matching Distillation (DMD) has been successfully applied to text-to-image diffusion models such as Stable Diffusion (SD) 1.5. However, vanilla DMD suffers from convergence difficulties on large-scale flow-based text-to-image models, such as SD 3.5 and FLUX. In this paper, we first analyze the issues when applying vanilla DMD on large-scale models. Then, to overcome the scalability challenge, we propose implicit distribution alignment (IDA) to regularize the distance between the generator and fake distribution. Furthermore, we propose intra-segment guidance (ISG) to relocate the timestep importance distribution from the teacher model. With IDA alone, DMD converges for SD 3.5; employing both IDA and ISG, DMD converges for SD 3.5 and FLUX.1 dev. Along with other improvements such as scaled up discriminator models, our final model, dubbed SenseFlow, achieves superior performance in distillation for both diffusion based text-to-image models such as SDXL, and flow-matching models such as SD 3.5 Large and FLUX. The source code will be avaliable at https://github.com/XingtongGe/SenseFlow.

SenseFlow：面向流式文本到图像蒸馏的分布式匹配扩展

SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation

摘要

Support