DC-Gen:基于深度压缩潜在空间的训练后扩散加速
DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space
September 29, 2025
作者: Wenkun He, Yuchao Gu, Junyu Chen, Dongyun Zou, Yujun Lin, Zhekai Zhang, Haocheng Xi, Muyang Li, Ligeng Zhu, Jincheng Yu, Junsong Chen, Enze Xie, Song Han, Han Cai
cs.AI
摘要
现有的文本到图像扩散模型在生成高质量图像方面表现出色,但在扩展到高分辨率(如4K图像生成)时面临显著的效率挑战。尽管先前的研究在多个方面加速了扩散模型,却很少处理潜在空间中的固有冗余。为填补这一空白,本文提出了DC-Gen,一个通过利用深度压缩潜在空间来加速文本到图像扩散模型的通用框架。DC-Gen摒弃了成本高昂的从头训练方法,转而采用高效的训练后处理流程,以保持基础模型的质量。这一范式中的关键挑战在于基础模型潜在空间与深度压缩潜在空间之间的表示差距,这可能导致直接微调时的不稳定性。为解决此问题,DC-Gen首先通过轻量级的嵌入对齐训练来弥合表示差距。一旦潜在嵌入对齐,仅需少量LoRA微调即可释放基础模型固有的生成质量。我们在SANA和FLUX.1-Krea上验证了DC-Gen的有效性。由此产生的DC-Gen-SANA和DC-Gen-FLUX模型在保持与基础模型相当质量的同时,实现了显著的加速。具体而言,DC-Gen-FLUX在NVIDIA H100 GPU上将4K图像生成的延迟降低了53倍。结合NVFP4 SVDQuant技术,DC-Gen-FLUX在单块NVIDIA 5090 GPU上仅需3.5秒即可生成一张4K图像,相比基础FLUX.1-Krea模型,总延迟减少了138倍。代码地址:https://github.com/dc-ai-projects/DC-Gen。
English
Existing text-to-image diffusion models excel at generating high-quality
images, but face significant efficiency challenges when scaled to high
resolutions, like 4K image generation. While previous research accelerates
diffusion models in various aspects, it seldom handles the inherent redundancy
within the latent space. To bridge this gap, this paper introduces DC-Gen, a
general framework that accelerates text-to-image diffusion models by leveraging
a deeply compressed latent space. Rather than a costly training-from-scratch
approach, DC-Gen uses an efficient post-training pipeline to preserve the
quality of the base model. A key challenge in this paradigm is the
representation gap between the base model's latent space and a deeply
compressed latent space, which can lead to instability during direct
fine-tuning. To overcome this, DC-Gen first bridges the representation gap with
a lightweight embedding alignment training. Once the latent embeddings are
aligned, only a small amount of LoRA fine-tuning is needed to unlock the base
model's inherent generation quality. We verify DC-Gen's effectiveness on SANA
and FLUX.1-Krea. The resulting DC-Gen-SANA and DC-Gen-FLUX models achieve
quality comparable to their base models but with a significant speedup.
Specifically, DC-Gen-FLUX reduces the latency of 4K image generation by 53x on
the NVIDIA H100 GPU. When combined with NVFP4 SVDQuant, DC-Gen-FLUX generates a
4K image in just 3.5 seconds on a single NVIDIA 5090 GPU, achieving a total
latency reduction of 138x compared to the base FLUX.1-Krea model. Code:
https://github.com/dc-ai-projects/DC-Gen.