SenseFlow：面向流式文本至图像蒸馏的分布匹配扩展技术

摘要

分佈匹配蒸餾（Distribution Matching Distillation, DMD）已成功應用於如Stable Diffusion（SD）1.5等文本至圖像擴散模型。然而，傳統的DMD在處理大規模基於流的文本至圖像模型，如SD 3.5和FLUX時，面臨收斂困難。本文首先分析了將傳統DMD應用於大規模模型時所遇到的問題。隨後，為克服可擴展性挑戰，我們提出了隱式分佈對齊（Implicit Distribution Alignment, IDA）來規範生成器與虛假分佈之間的距離。此外，我們提出了段內指導（Intra-Segment Guidance, ISG）以重新定位教師模型中的時間步重要性分佈。僅使用IDA，DMD即可在SD 3.5上實現收斂；同時採用IDA與ISG，DMD則能在SD 3.5和FLUX.1 dev上實現收斂。結合其他改進措施，如擴大判別器模型規模，我們最終的模型——名為SenseFlow——在基於擴散的文本至圖像模型（如SDXL）及基於流匹配的模型（如SD 3.5 Large和FLUX）的蒸餾過程中展現出卓越性能。源代碼將公佈於https://github.com/XingtongGe/SenseFlow。

English

The Distribution Matching Distillation (DMD) has been successfully applied to text-to-image diffusion models such as Stable Diffusion (SD) 1.5. However, vanilla DMD suffers from convergence difficulties on large-scale flow-based text-to-image models, such as SD 3.5 and FLUX. In this paper, we first analyze the issues when applying vanilla DMD on large-scale models. Then, to overcome the scalability challenge, we propose implicit distribution alignment (IDA) to regularize the distance between the generator and fake distribution. Furthermore, we propose intra-segment guidance (ISG) to relocate the timestep importance distribution from the teacher model. With IDA alone, DMD converges for SD 3.5; employing both IDA and ISG, DMD converges for SD 3.5 and FLUX.1 dev. Along with other improvements such as scaled up discriminator models, our final model, dubbed SenseFlow, achieves superior performance in distillation for both diffusion based text-to-image models such as SDXL, and flow-matching models such as SD 3.5 Large and FLUX. The source code will be avaliable at https://github.com/XingtongGe/SenseFlow.

SenseFlow：面向流式文本至图像蒸馏的分布匹配扩展技术

SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation

摘要

Support