ProReflow: 分解速度を用いたプログレッシブリフロー

要旨

拡散モデルは画像および動画生成において大きな進展を遂げているものの、依然として膨大な計算コストが課題となっています。この問題に対する効果的な解決策として、フローマッチングは拡散モデルのプロセスを直線的に再構築し、数ステップ、さらにはワンステップでの生成を可能にすることを目指しています。しかし、本論文では、元来のフローマッチングのトレーニングパイプラインが最適ではないことを指摘し、それを改善するための2つの手法を提案します。まず、プログレッシブリフローを導入し、拡散モデルを局所的なタイムステップで段階的に再構築することで、フローマッチングの難易度を低減します。次に、アラインドv-予測を導入し、フローマッチングにおける方向性のマッチングが大きさのマッチングよりも重要であることを強調します。SDv1.5およびSDXLでの実験結果は、本手法の有効性を示しています。例えば、SDv1.5において、MSCOCO2014検証セットでFID 10.70を達成し、わずか4サンプリングステップで教師モデル（32 DDIMステップ、FID = 10.05）に近い性能を実現しました。

English

Diffusion models have achieved significant progress in both image and video generation while still suffering from huge computation costs. As an effective solution, flow matching aims to reflow the diffusion process of diffusion models into a straight line for a few-step and even one-step generation. However, in this paper, we suggest that the original training pipeline of flow matching is not optimal and introduce two techniques to improve it. Firstly, we introduce progressive reflow, which progressively reflows the diffusion models in local timesteps until the whole diffusion progresses, reducing the difficulty of flow matching. Second, we introduce aligned v-prediction, which highlights the importance of direction matching in flow matching over magnitude matching. Experimental results on SDv1.5 and SDXL demonstrate the effectiveness of our method, for example, conducting on SDv1.5 achieves an FID of 10.70 on MSCOCO2014 validation set with only 4 sampling steps, close to our teacher model (32 DDIM steps, FID = 10.05).

ProReflow: 分解速度を用いたプログレッシブリフロー

ProReflow: Progressive Reflow with Decomposed Velocity

要旨

Support