引導一步擴散模型結合保真度豐富解碼器實現快速圖像壓縮
Steering One-Step Diffusion Model with Fidelity-Rich Decoder for Fast Image Compression
August 7, 2025
作者: Zheng Chen, Mingde Zhou, Jinpei Guo, Jiale Yuan, Yifei Ji, Yulun Zhang
cs.AI
摘要
基於擴散的圖像壓縮技術已展現出卓越的感知性能。然而,該技術存在兩個關鍵缺陷:(1) 由於多步採樣導致的解碼延遲過長,以及 (2) 過度依賴生成先驗而導致的保真度不足。為解決這些問題,我們提出了SODEC,一種新穎的單步擴散圖像壓縮模型。我們認為,在圖像壓縮中,一個信息量足夠的潛在表示使得多步精煉變得不必要。基於這一洞見,我們利用預訓練的基於VAE的模型來生成富含信息的潛在表示,並以單步解碼取代迭代去噪過程。同時,為提升保真度,我們引入了保真度指導模塊,促使輸出忠實於原始圖像。此外,我們設計了速率退火訓練策略,以在極低比特率下實現有效訓練。大量實驗表明,SODEC顯著超越了現有方法,達到了優異的率失真感知性能。與之前的基於擴散的壓縮模型相比,SODEC將解碼速度提高了超過20倍。代碼已發佈於:https://github.com/zhengchen1999/SODEC。
English
Diffusion-based image compression has demonstrated impressive perceptual
performance. However, it suffers from two critical drawbacks: (1) excessive
decoding latency due to multi-step sampling, and (2) poor fidelity resulting
from over-reliance on generative priors. To address these issues, we propose
SODEC, a novel single-step diffusion image compression model. We argue that in
image compression, a sufficiently informative latent renders multi-step
refinement unnecessary. Based on this insight, we leverage a pre-trained
VAE-based model to produce latents with rich information, and replace the
iterative denoising process with a single-step decoding. Meanwhile, to improve
fidelity, we introduce the fidelity guidance module, encouraging output that is
faithful to the original image. Furthermore, we design the rate annealing
training strategy to enable effective training under extremely low bitrates.
Extensive experiments show that SODEC significantly outperforms existing
methods, achieving superior rate-distortion-perception performance. Moreover,
compared to previous diffusion-based compression models, SODEC improves
decoding speed by more than 20times. Code is released at:
https://github.com/zhengchen1999/SODEC.