SD3.5-Flash：生成フローの分布誘導型蒸留

要旨

私たちは、高品質な画像生成を手頃な消費者向けデバイスに実現する効率的な数ステップ蒸留フレームワーク「SD3.5-Flash」を提案します。本手法では、計算コストが高い整流フロー（rectified flow）モデルを、数ステップ生成に特化して再定式化した分布マッチング目的関数を用いて蒸留します。2つの主要なイノベーションを導入しました。1つは勾配ノイズを低減する「タイムステップ共有」、もう1つはプロンプト整合性を向上させる「分割タイムステップ微調整」です。これらに加え、テキストエンコーダの再構築や専用量子化などの包括的なパイプライン最適化を組み合わせることで、システムは高速な生成とメモリ効率の良い展開をさまざまなハードウェア構成で実現します。これにより、モバイル端末からデスクトップコンピュータまで、幅広いデバイスへのアクセスが民主化されます。大規模なユーザー調査を含む広範な評価を通じて、SD3.5-Flashが既存の数ステップ手法を一貫して上回り、先進的な生成AIを実用的な展開に真にアクセス可能にすることを実証しました。

English

We present SD3.5-Flash, an efficient few-step distillation framework that brings high-quality image generation to accessible consumer devices. Our approach distills computationally prohibitive rectified flow models through a reformulated distribution matching objective tailored specifically for few-step generation. We introduce two key innovations: "timestep sharing" to reduce gradient noise and "split-timestep fine-tuning" to improve prompt alignment. Combined with comprehensive pipeline optimizations like text encoder restructuring and specialized quantization, our system enables both rapid generation and memory-efficient deployment across different hardware configurations. This democratizes access across the full spectrum of devices, from mobile phones to desktop computers. Through extensive evaluation including large-scale user studies, we demonstrate that SD3.5-Flash consistently outperforms existing few-step methods, making advanced generative AI truly accessible for practical deployment.

SD3.5-Flash：生成フローの分布誘導型蒸留

SD3.5-Flash: Distribution-Guided Distillation of Generative Flows

要旨

Support