SenseFlow: フローベースのテキストから画像への蒸留における分布マッチングのスケーリング

要旨

Distribution Matching Distillation（DMD）は、Stable Diffusion（SD）1.5などのテキストから画像への拡散モデルに成功裏に適用されてきた。しかし、バニラDMDは、SD 3.5やFLUXなどの大規模なフローベースのテキストから画像へのモデルにおいて収束の困難に直面している。本論文では、まず大規模モデルにバニラDMDを適用する際の問題点を分析する。次に、スケーラビリティの課題を克服するため、生成器と偽分布の間の距離を正則化するための暗黙的分布アライメント（IDA）を提案する。さらに、教師モデルからタイムステップの重要度分布を再配置するためのセグメント内ガイダンス（ISG）を提案する。IDAのみを用いることで、DMDはSD 3.5で収束し、IDAとISGの両方を用いることで、DMDはSD 3.5とFLUX.1 devで収束する。スケールアップした識別器モデルなどの他の改善点とともに、我々の最終モデルであるSenseFlowは、SDXLなどの拡散ベースのテキストから画像モデルや、SD 3.5 LargeやFLUXなどのフローマッチングモデルにおいて、蒸留において優れた性能を達成する。ソースコードはhttps://github.com/XingtongGe/SenseFlowで公開予定である。

English

The Distribution Matching Distillation (DMD) has been successfully applied to text-to-image diffusion models such as Stable Diffusion (SD) 1.5. However, vanilla DMD suffers from convergence difficulties on large-scale flow-based text-to-image models, such as SD 3.5 and FLUX. In this paper, we first analyze the issues when applying vanilla DMD on large-scale models. Then, to overcome the scalability challenge, we propose implicit distribution alignment (IDA) to regularize the distance between the generator and fake distribution. Furthermore, we propose intra-segment guidance (ISG) to relocate the timestep importance distribution from the teacher model. With IDA alone, DMD converges for SD 3.5; employing both IDA and ISG, DMD converges for SD 3.5 and FLUX.1 dev. Along with other improvements such as scaled up discriminator models, our final model, dubbed SenseFlow, achieves superior performance in distillation for both diffusion based text-to-image models such as SDXL, and flow-matching models such as SD 3.5 Large and FLUX. The source code will be avaliable at https://github.com/XingtongGe/SenseFlow.

SenseFlow: フローベースのテキストから画像への蒸留における分布マッチングのスケーリング

SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation

要旨

Support