VideoLCM: 비디오 잠재 일관성 모델

초록

일관성 모델(Consistency Model)은 효율적인 이미지 생성에서 강력한 능력을 입증했으며, 적은 샘플링 단계 내에서 합성을 가능하게 하여 확산 모델(Diffusion Model)의 높은 계산 비용을 완화했습니다. 그러나 더 도전적이고 자원 소모가 큰 비디오 생성 분야에서의 일관성 모델은 아직까지 덜 탐구된 상태입니다. 본 보고서에서는 이러한 격차를 메우기 위해 VideoLCM 프레임워크를 제안합니다. VideoLCM은 이미지 생성에서의 일관성 모델 개념을 활용하여 최소한의 단계로도 고품질의 비디오를 효율적으로 합성합니다. VideoLCM은 기존의 잠재 비디오 확산 모델(Latent Video Diffusion Model)을 기반으로 하며, 잠재 일관성 모델(Latent Consistency Model)을 학습하기 위해 일관성 증류(Consistency Distillation) 기법을 통합합니다. 실험 결과는 VideoLCM이 계산 효율성, 정확도 및 시간적 일관성 측면에서 효과적임을 보여줍니다. 특히, VideoLCM은 단 4개의 샘플링 단계로도 고화질과 부드러운 비디오 합성을 달성하며, 실시간 합성의 가능성을 입증합니다. 우리는 VideoLCM이 후속 연구를 위한 간단하면서도 효과적인 기준선으로 활용되기를 바랍니다. 소스 코드와 모델은 공개될 예정입니다.

English

Consistency models have demonstrated powerful capability in efficient image generation and allowed synthesis within a few sampling steps, alleviating the high computational cost in diffusion models. However, the consistency model in the more challenging and resource-consuming video generation is still less explored. In this report, we present the VideoLCM framework to fill this gap, which leverages the concept of consistency models from image generation to efficiently synthesize videos with minimal steps while maintaining high quality. VideoLCM builds upon existing latent video diffusion models and incorporates consistency distillation techniques for training the latent consistency model. Experimental results reveal the effectiveness of our VideoLCM in terms of computational efficiency, fidelity and temporal consistency. Notably, VideoLCM achieves high-fidelity and smooth video synthesis with only four sampling steps, showcasing the potential for real-time synthesis. We hope that VideoLCM can serve as a simple yet effective baseline for subsequent research. The source code and models will be publicly available.

VideoLCM: 비디오 잠재 일관성 모델

VideoLCM: Video Latent Consistency Model

초록

Support