DC-VideoGen: 딥 압축 비디오 오토인코더를 활용한 효율적인 비디오 생성

초록

우리는 효율적인 비디오 생성을 위한 사후 학습 가속 프레임워크인 DC-VideoGen을 소개한다. DC-VideoGen은 사전 학습된 모든 비디오 확산 모델에 적용 가능하며, 경량 미세 조정을 통해 깊은 압축 잠재 공간에 적응시켜 효율성을 향상시킨다. 이 프레임워크는 두 가지 핵심 혁신을 기반으로 한다: (i) 새로운 청크-인과적 시간 설계를 갖춘 Deep Compression Video Autoencoder로, 32x/64x 공간 압축과 4x 시간 압축을 달성하면서도 재구성 품질과 더 긴 비디오에 대한 일반화를 유지한다; 그리고 (ii) 사전 학습된 모델을 새로운 잠재 공간으로 빠르고 안정적으로 전이할 수 있는 AE-Adapt-V라는 강력한 적응 전략이다. DC-VideoGen을 사용하여 사전 학습된 Wan-2.1-14B 모델을 적응시키는 데는 NVIDIA H100 GPU에서 단 10 GPU 일이 소요된다. 가속화된 모델은 품질 저하 없이 기본 모델 대비 최대 14.8배 낮은 추론 지연 시간을 달성하며, 단일 GPU에서 2160x3840 해상도의 비디오 생성을 가능하게 한다. 코드: https://github.com/dc-ai-projects/DC-VideoGen.

English

We introduce DC-VideoGen, a post-training acceleration framework for efficient video generation. DC-VideoGen can be applied to any pre-trained video diffusion model, improving efficiency by adapting it to a deep compression latent space with lightweight fine-tuning. The framework builds on two key innovations: (i) a Deep Compression Video Autoencoder with a novel chunk-causal temporal design that achieves 32x/64x spatial and 4x temporal compression while preserving reconstruction quality and generalization to longer videos; and (ii) AE-Adapt-V, a robust adaptation strategy that enables rapid and stable transfer of pre-trained models into the new latent space. Adapting the pre-trained Wan-2.1-14B model with DC-VideoGen requires only 10 GPU days on the NVIDIA H100 GPU. The accelerated models achieve up to 14.8x lower inference latency than their base counterparts without compromising quality, and further enable 2160x3840 video generation on a single GPU. Code: https://github.com/dc-ai-projects/DC-VideoGen.

DC-VideoGen: 딥 압축 비디오 오토인코더를 활용한 효율적인 비디오 생성

DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder

초록

Support