VideoLCM：影片潛在一致性模型

摘要

一致性模型在高效圖像生成方面展現出強大能力，並允許在少數取樣步驟內進行合成，減輕擴散模型中的高計算成本。然而，在更具挑戰性和資源消耗大的視頻生成方面，一致性模型仍然較少被探索。在本報告中，我們提出了VideoLCM框架來填補這一空白，該框架借鑒了從圖像生成中的一致性模型的概念，以最少的步驟高效合成視頻並保持高質量。VideoLCM基於現有的潛在視頻擴散模型，並導入了一致性蒸餾技術來訓練潛在的一致性模型。實驗結果顯示了我們的VideoLCM在計算效率、保真度和時間一致性方面的有效性。值得注意的是，VideoLCM僅需四個取樣步驟即可實現高保真度和平滑的視頻合成，展示了實時合成的潛力。我們希望VideoLCM能成為後續研究的一個簡單而有效的基準。源代碼和模型將公開提供。

English

Consistency models have demonstrated powerful capability in efficient image generation and allowed synthesis within a few sampling steps, alleviating the high computational cost in diffusion models. However, the consistency model in the more challenging and resource-consuming video generation is still less explored. In this report, we present the VideoLCM framework to fill this gap, which leverages the concept of consistency models from image generation to efficiently synthesize videos with minimal steps while maintaining high quality. VideoLCM builds upon existing latent video diffusion models and incorporates consistency distillation techniques for training the latent consistency model. Experimental results reveal the effectiveness of our VideoLCM in terms of computational efficiency, fidelity and temporal consistency. Notably, VideoLCM achieves high-fidelity and smooth video synthesis with only four sampling steps, showcasing the potential for real-time synthesis. We hope that VideoLCM can serve as a simple yet effective baseline for subsequent research. The source code and models will be publicly available.

VideoLCM：影片潛在一致性模型

VideoLCM: Video Latent Consistency Model

摘要

Support