高速画像圧縮のための忠実度豊かなデコーダを用いたワンステップ拡散モデルの制御

要旨

拡散ベースの画像圧縮は、印象的な知覚性能を実証してきた。しかし、これには2つの重大な欠点がある：(1) 多段階サンプリングによる過剰なデコード遅延、(2) 生成事前分布への過度の依存に起因する忠実度の低さ。これらの問題を解決するため、我々はSODECという新しい単一段階拡散画像圧縮モデルを提案する。画像圧縮において、十分に情報量の多い潜在変数は多段階の精緻化を不要にするという見解に基づき、事前学習済みのVAEベースモデルを活用して情報量の豊富な潜在変数を生成し、反復的なノイズ除去プロセスを単一段階のデコードに置き換える。一方、忠実度を向上させるため、元の画像に忠実な出力を促す忠実度ガイダンスモジュールを導入する。さらに、極低ビットレート下での効果的な学習を可能にするために、レートアニーリングトレーニング戦略を設計する。広範な実験により、SODECが既存の手法を大幅に上回り、優れたレート-歪み-知覚性能を達成することが示された。また、従来の拡散ベース圧縮モデルと比較して、SODECはデコード速度を20倍以上向上させる。コードはhttps://github.com/zhengchen1999/SODECで公開されている。

English

Diffusion-based image compression has demonstrated impressive perceptual performance. However, it suffers from two critical drawbacks: (1) excessive decoding latency due to multi-step sampling, and (2) poor fidelity resulting from over-reliance on generative priors. To address these issues, we propose SODEC, a novel single-step diffusion image compression model. We argue that in image compression, a sufficiently informative latent renders multi-step refinement unnecessary. Based on this insight, we leverage a pre-trained VAE-based model to produce latents with rich information, and replace the iterative denoising process with a single-step decoding. Meanwhile, to improve fidelity, we introduce the fidelity guidance module, encouraging output that is faithful to the original image. Furthermore, we design the rate annealing training strategy to enable effective training under extremely low bitrates. Extensive experiments show that SODEC significantly outperforms existing methods, achieving superior rate-distortion-perception performance. Moreover, compared to previous diffusion-based compression models, SODEC improves decoding speed by more than 20times. Code is released at: https://github.com/zhengchen1999/SODEC.

高速画像圧縮のための忠実度豊かなデコーダを用いたワンステップ拡散モデルの制御

Steering One-Step Diffusion Model with Fidelity-Rich Decoder for Fast Image Compression

要旨

Support