DC-VideoGen:基于深度压缩视频自动编码器的高效视频生成
DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder
September 29, 2025
作者: Junyu Chen, Wenkun He, Yuchao Gu, Yuyang Zhao, Jincheng Yu, Junsong Chen, Dongyun Zou, Yujun Lin, Zhekai Zhang, Muyang Li, Haocheng Xi, Ligeng Zhu, Enze Xie, Song Han, Han Cai
cs.AI
摘要
我们推出DC-VideoGen,一种用于高效视频生成的后训练加速框架。DC-VideoGen可应用于任何预训练的视频扩散模型,通过轻量级微调将其适配至深度压缩的潜在空间,从而提升效率。该框架基于两大创新:(i) 深度压缩视频自动编码器,采用新颖的块因果时序设计,在保持重建质量及对更长视频泛化能力的同时,实现了32倍/64倍的空间压缩和4倍的时间压缩;(ii) AE-Adapt-V,一种稳健的适应策略,能够快速且稳定地将预训练模型迁移至新的潜在空间。使用DC-VideoGen对预训练的Wan-2.1-14B模型进行适配,仅需在NVIDIA H100 GPU上耗费10个GPU日。加速后的模型在不牺牲质量的前提下,推理延迟最多降低14.8倍,并进一步支持在单GPU上生成2160x3840分辨率的视频。代码地址:https://github.com/dc-ai-projects/DC-VideoGen。
English
We introduce DC-VideoGen, a post-training acceleration framework for
efficient video generation. DC-VideoGen can be applied to any pre-trained video
diffusion model, improving efficiency by adapting it to a deep compression
latent space with lightweight fine-tuning. The framework builds on two key
innovations: (i) a Deep Compression Video Autoencoder with a novel chunk-causal
temporal design that achieves 32x/64x spatial and 4x temporal compression while
preserving reconstruction quality and generalization to longer videos; and (ii)
AE-Adapt-V, a robust adaptation strategy that enables rapid and stable transfer
of pre-trained models into the new latent space. Adapting the pre-trained
Wan-2.1-14B model with DC-VideoGen requires only 10 GPU days on the NVIDIA H100
GPU. The accelerated models achieve up to 14.8x lower inference latency than
their base counterparts without compromising quality, and further enable
2160x3840 video generation on a single GPU. Code:
https://github.com/dc-ai-projects/DC-VideoGen.