帰納的モーメントマッチング

要旨

拡散モデルやFlow Matchingは高品質なサンプルを生成しますが、推論が遅く、それらを少ステップモデルに蒸留すると不安定さや大量のチューニングが必要になることがよくあります。これらのトレードオフを解決するため、我々はInductive Moment Matching (IMM)を提案します。これは、単一ステージの訓練手順で1ステップまたは少数ステップのサンプリングを可能にする新しいクラスの生成モデルです。蒸留とは異なり、IMMは事前訓練された初期化や2つのネットワークの最適化を必要としません。また、Consistency Modelsとは異なり、IMMは分布レベルの収束を保証し、様々なハイパーパラメータや標準的なモデルアーキテクチャの下で安定しています。IMMは、ImageNet-256x256において8推論ステップのみで1.99のFIDを達成し、拡散モデルを上回りました。さらに、CIFAR-10ではスクラッチから訓練したモデルで2ステップFID 1.98という最先端の結果を達成しました。

English

Diffusion models and Flow Matching generate high-quality samples but are slow at inference, and distilling them into few-step models often leads to instability and extensive tuning. To resolve these trade-offs, we propose Inductive Moment Matching (IMM), a new class of generative models for one- or few-step sampling with a single-stage training procedure. Unlike distillation, IMM does not require pre-training initialization and optimization of two networks; and unlike Consistency Models, IMM guarantees distribution-level convergence and remains stable under various hyperparameters and standard model architectures. IMM surpasses diffusion models on ImageNet-256x256 with 1.99 FID using only 8 inference steps and achieves state-of-the-art 2-step FID of 1.98 on CIFAR-10 for a model trained from scratch.