潜在的一貫性モデルのための改良されたトレーニングテクニック

要旨

コンシステンシーモデルは、単一ステップまたは複数ステップで高品質なサンプルを生成できる新しい生成モデルファミリーです。最近、コンシステンシーモデルは、ピクセル空間において拡散モデルと同等の結果を達成し、印象的なパフォーマンスを示しています。ただし、大規模データセットにおけるコンシステンシートレーニングの成功は、特にテキストから画像やビデオを生成するタスクにおいて、潜在空間でのパフォーマンスによって決定されます。本研究では、ピクセル空間と潜在空間の統計的な違いを分析し、潜在データにはしばしば高いインパルスの外れ値が含まれており、これが潜在空間におけるiCTのパフォーマンスを著しく低下させることがわかりました。これを解決するために、疑似ハバー損失をコーシー損失に置き換えることで、外れ値の影響を効果的に軽減します。さらに、初期タイムステップで拡散損失を導入し、最適輸送（OT）カップリングを利用してパフォーマンスをさらに向上させます。最後に、適応的なスケーリングcスケジューラを導入して堅牢なトレーニングプロセスを管理し、アーキテクチャにはスケーリングレイヤーノルムを採用して特徴の統計をより適切に捉え、外れ値の影響を軽減します。これらの戦略により、1ステップまたは2ステップで高品質なサンプリングが可能な潜在コンシステンシーモデルを成功裏にトレーニングし、潜在コンシステンシーと拡散モデルのパフォーマンスの差を著しく縮小させました。実装はこちらで公開されています：https://github.com/quandao10/sLCT/

English

Consistency models are a new family of generative models capable of producing high-quality samples in either a single step or multiple steps. Recently, consistency models have demonstrated impressive performance, achieving results on par with diffusion models in the pixel space. However, the success of scaling consistency training to large-scale datasets, particularly for text-to-image and video generation tasks, is determined by performance in the latent space. In this work, we analyze the statistical differences between pixel and latent spaces, discovering that latent data often contains highly impulsive outliers, which significantly degrade the performance of iCT in the latent space. To address this, we replace Pseudo-Huber losses with Cauchy losses, effectively mitigating the impact of outliers. Additionally, we introduce a diffusion loss at early timesteps and employ optimal transport (OT) coupling to further enhance performance. Lastly, we introduce the adaptive scaling-c scheduler to manage the robust training process and adopt Non-scaling LayerNorm in the architecture to better capture the statistics of the features and reduce outlier impact. With these strategies, we successfully train latent consistency models capable of high-quality sampling with one or two steps, significantly narrowing the performance gap between latent consistency and diffusion models. The implementation is released here: https://github.com/quandao10/sLCT/

潜在的一貫性モデルのための改良されたトレーニングテクニック

Improved Training Technique for Latent Consistency Models

要旨

Support