DC-VideoGen：基于深度压缩视频自动编码器的高效视频生成

摘要

我们推出DC-VideoGen，一种用于高效视频生成的后训练加速框架。DC-VideoGen可应用于任何预训练的视频扩散模型，通过轻量级微调将其适配至深度压缩的潜在空间，从而提升效率。该框架基于两大创新：(i) 深度压缩视频自动编码器，采用新颖的块因果时序设计，在保持重建质量及对更长视频泛化能力的同时，实现了32倍/64倍的空间压缩和4倍的时间压缩；(ii) AE-Adapt-V，一种稳健的适应策略，能够快速且稳定地将预训练模型迁移至新的潜在空间。使用DC-VideoGen对预训练的Wan-2.1-14B模型进行适配，仅需在NVIDIA H100 GPU上耗费10个GPU日。加速后的模型在不牺牲质量的前提下，推理延迟最多降低14.8倍，并进一步支持在单GPU上生成2160x3840分辨率的视频。代码地址：https://github.com/dc-ai-projects/DC-VideoGen。

English

We introduce DC-VideoGen, a post-training acceleration framework for efficient video generation. DC-VideoGen can be applied to any pre-trained video diffusion model, improving efficiency by adapting it to a deep compression latent space with lightweight fine-tuning. The framework builds on two key innovations: (i) a Deep Compression Video Autoencoder with a novel chunk-causal temporal design that achieves 32x/64x spatial and 4x temporal compression while preserving reconstruction quality and generalization to longer videos; and (ii) AE-Adapt-V, a robust adaptation strategy that enables rapid and stable transfer of pre-trained models into the new latent space. Adapting the pre-trained Wan-2.1-14B model with DC-VideoGen requires only 10 GPU days on the NVIDIA H100 GPU. The accelerated models achieve up to 14.8x lower inference latency than their base counterparts without compromising quality, and further enable 2160x3840 video generation on a single GPU. Code: https://github.com/dc-ai-projects/DC-VideoGen.

DC-VideoGen：基于深度压缩视频自动编码器的高效视频生成

DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder

摘要

Support