安定した一貫性チューニング：一貫性モデルの理解と改善

要旨

拡散モデルは、ノイズ除去の反復的な性質に起因して生成速度が遅くなるという欠点がありますが、優れた生成品質を実現します。一方、一貫性モデルという新しい生成ファミリーは、著しく高速なサンプリングを実現しつつ競争力のあるパフォーマンスを発揮します。これらのモデルは、事前に訓練された拡散モデルを活用する一貫性蒸留、または生データから直接一貫性トレーニング/調整を行うことで訓練されます。本研究では、拡散モデルのノイズ除去プロセスをマルコフ決定過程（MDP）としてモデリングし、一貫性モデルのトレーニングを時間差分学習を介した価値推定としてフレーム化する新しいフレームワークを提案します。さらに、このフレームワークにより、現在の一貫性トレーニング/調整戦略の限界を分析することが可能となります。Easy Consistency Tuning（ECT）を基盤として、スコア同一性を用いた分散低減学習を組み込んだStable Consistency Tuning（SCT）を提案します。SCTは、CIFAR-10やImageNet-64などのベンチマークで著しいパフォーマンス向上をもたらします。ImageNet-64では、SCTが1ステップFID 2.42および2ステップFID 1.55を達成し、一貫性モデルにおいて新たな最先端技術を実現します。

English

Diffusion models achieve superior generation quality but suffer from slow generation speed due to the iterative nature of denoising. In contrast, consistency models, a new generative family, achieve competitive performance with significantly faster sampling. These models are trained either through consistency distillation, which leverages pretrained diffusion models, or consistency training/tuning directly from raw data. In this work, we propose a novel framework for understanding consistency models by modeling the denoising process of the diffusion model as a Markov Decision Process (MDP) and framing consistency model training as the value estimation through Temporal Difference~(TD) Learning. More importantly, this framework allows us to analyze the limitations of current consistency training/tuning strategies. Built upon Easy Consistency Tuning (ECT), we propose Stable Consistency Tuning (SCT), which incorporates variance-reduced learning using the score identity. SCT leads to significant performance improvements on benchmarks such as CIFAR-10 and ImageNet-64. On ImageNet-64, SCT achieves 1-step FID 2.42 and 2-step FID 1.55, a new SoTA for consistency models.

安定した一貫性チューニング：一貫性モデルの理解と改善

Stable Consistency Tuning: Understanding and Improving Consistency Models

要旨

Support