DC-Gen:基於深度壓縮潛在空間的訓練後擴散加速技術
DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space
September 29, 2025
作者: Wenkun He, Yuchao Gu, Junyu Chen, Dongyun Zou, Yujun Lin, Zhekai Zhang, Haocheng Xi, Muyang Li, Ligeng Zhu, Jincheng Yu, Junsong Chen, Enze Xie, Song Han, Han Cai
cs.AI
摘要
現有的文本到圖像擴散模型在生成高質量圖像方面表現出色,但在擴展至高分辨率(如4K圖像生成)時面臨顯著的效率挑戰。雖然先前的研究在多個方面加速了擴散模型,但很少處理潛在空間內固有的冗餘問題。為彌補這一差距,本文提出了DC-Gen,這是一個通過利用深度壓縮潛在空間來加速文本到圖像擴散模型的通用框架。DC-Gen並非採用成本高昂的從頭訓練方法,而是使用高效的訓練後處理流程來保持基礎模型的質量。這一範式中的關鍵挑戰在於基礎模型的潛在空間與深度壓縮潛在空間之間的表示差距,這可能導致直接微調時的不穩定性。為克服這一問題,DC-Gen首先通過輕量級的嵌入對齊訓練來彌合表示差距。一旦潛在嵌入對齊,僅需少量的LoRA微調即可釋放基礎模型的固有生成質量。我們在SANA和FLUX.1-Krea上驗證了DC-Gen的有效性。由此產生的DC-Gen-SANA和DC-Gen-FLUX模型在保持與基礎模型相當質量的同時,實現了顯著的加速。具體而言,DC-Gen-FLUX在NVIDIA H100 GPU上將4K圖像生成的延遲降低了53倍。結合NVFP4 SVDQuant,DC-Gen-FLUX在單個NVIDIA 5090 GPU上僅需3.5秒即可生成一張4K圖像,相比基礎FLUX.1-Krea模型,總延遲減少了138倍。代碼:https://github.com/dc-ai-projects/DC-Gen。
English
Existing text-to-image diffusion models excel at generating high-quality
images, but face significant efficiency challenges when scaled to high
resolutions, like 4K image generation. While previous research accelerates
diffusion models in various aspects, it seldom handles the inherent redundancy
within the latent space. To bridge this gap, this paper introduces DC-Gen, a
general framework that accelerates text-to-image diffusion models by leveraging
a deeply compressed latent space. Rather than a costly training-from-scratch
approach, DC-Gen uses an efficient post-training pipeline to preserve the
quality of the base model. A key challenge in this paradigm is the
representation gap between the base model's latent space and a deeply
compressed latent space, which can lead to instability during direct
fine-tuning. To overcome this, DC-Gen first bridges the representation gap with
a lightweight embedding alignment training. Once the latent embeddings are
aligned, only a small amount of LoRA fine-tuning is needed to unlock the base
model's inherent generation quality. We verify DC-Gen's effectiveness on SANA
and FLUX.1-Krea. The resulting DC-Gen-SANA and DC-Gen-FLUX models achieve
quality comparable to their base models but with a significant speedup.
Specifically, DC-Gen-FLUX reduces the latency of 4K image generation by 53x on
the NVIDIA H100 GPU. When combined with NVFP4 SVDQuant, DC-Gen-FLUX generates a
4K image in just 3.5 seconds on a single NVIDIA 5090 GPU, achieving a total
latency reduction of 138x compared to the base FLUX.1-Krea model. Code:
https://github.com/dc-ai-projects/DC-Gen.