VideoLCM: ビデオ潜在一貫性モデル

要旨

一貫性モデルは、効率的な画像生成においてその強力な能力を実証し、拡散モデルにおける高い計算コストを軽減しながら、わずかなサンプリングステップでの合成を可能にしてきました。しかし、より困難でリソースを消費するビデオ生成における一貫性モデルは、まだ十分に探求されていません。本報告では、このギャップを埋めるためにVideoLCMフレームワークを提案します。このフレームワークは、画像生成における一貫性モデルの概念を活用し、最小限のステップで高品質なビデオを効率的に合成します。VideoLCMは、既存の潜在ビデオ拡散モデルを基盤とし、潜在一貫性モデルのトレーニングに一貫性蒸留技術を組み込んでいます。実験結果は、計算効率、忠実度、および時間的一貫性の観点から、VideoLCMの有効性を示しています。特に、VideoLCMはわずか4つのサンプリングステップで高忠実度かつ滑らかなビデオ合成を実現し、リアルタイム合成の可能性を示しています。VideoLCMが、今後の研究のためのシンプルで効果的なベースラインとして役立つことを期待しています。ソースコードとモデルは公開される予定です。

English

Consistency models have demonstrated powerful capability in efficient image generation and allowed synthesis within a few sampling steps, alleviating the high computational cost in diffusion models. However, the consistency model in the more challenging and resource-consuming video generation is still less explored. In this report, we present the VideoLCM framework to fill this gap, which leverages the concept of consistency models from image generation to efficiently synthesize videos with minimal steps while maintaining high quality. VideoLCM builds upon existing latent video diffusion models and incorporates consistency distillation techniques for training the latent consistency model. Experimental results reveal the effectiveness of our VideoLCM in terms of computational efficiency, fidelity and temporal consistency. Notably, VideoLCM achieves high-fidelity and smooth video synthesis with only four sampling steps, showcasing the potential for real-time synthesis. We hope that VideoLCM can serve as a simple yet effective baseline for subsequent research. The source code and models will be publicly available.

VideoLCM: ビデオ潜在一貫性モデル

VideoLCM: Video Latent Consistency Model

要旨

Support