视频LCM：视频潜在一致性模型

摘要

一致性模型在高效图像生成方面展现出强大能力，并允许在少数采样步骤内进行综合，从而减轻扩散模型中的高计算成本。然而，在更具挑战性和资源消耗大的视频生成中，一致性模型仍未得到充分探索。在本报告中，我们提出了VideoLCM框架来填补这一空白，该框架利用了从图像生成中的一致性模型的概念，以在保持高质量的同时，通过最少的步骤高效合成视频。VideoLCM基于现有的潜在视频扩散模型，并结合了一致性蒸馏技术来训练潜在的一致性模型。实验结果显示了我们的VideoLCM在计算效率、保真度和时间一致性方面的有效性。值得注意的是，VideoLCM仅需四个采样步骤即可实现高保真度和平滑的视频合成，展示了实时合成的潜力。我们希望VideoLCM可以作为后续研究的简单而有效的基准线。源代码和模型将公开提供。

English

Consistency models have demonstrated powerful capability in efficient image generation and allowed synthesis within a few sampling steps, alleviating the high computational cost in diffusion models. However, the consistency model in the more challenging and resource-consuming video generation is still less explored. In this report, we present the VideoLCM framework to fill this gap, which leverages the concept of consistency models from image generation to efficiently synthesize videos with minimal steps while maintaining high quality. VideoLCM builds upon existing latent video diffusion models and incorporates consistency distillation techniques for training the latent consistency model. Experimental results reveal the effectiveness of our VideoLCM in terms of computational efficiency, fidelity and temporal consistency. Notably, VideoLCM achieves high-fidelity and smooth video synthesis with only four sampling steps, showcasing the potential for real-time synthesis. We hope that VideoLCM can serve as a simple yet effective baseline for subsequent research. The source code and models will be publicly available.

视频LCM：视频潜在一致性模型

VideoLCM: Video Latent Consistency Model

摘要

Support