スコアベース生成モデルを用いた高忠実度画像圧縮

要旨

テキストから画像生成における拡散生成モデルの多大な成功にもかかわらず、画像圧縮の領域で同様の成功を再現することは困難であることが証明されてきました。本論文では、拡散モデルが所定のビットレートにおいて知覚品質を大幅に改善し、FIDスコアで測定された最新のアプローチであるPO-ELICおよびHiFiCを凌駕することを実証します。これは、MSEをターゲットとしたオートエンコーダと、さらにスコアベースのデコーダを組み合わせた、シンプルだが理論的に動機付けられた2段階アプローチを用いて達成されます。しかしながら、我々が示すように、実装の詳細は重要であり、最適な設計判断は典型的なテキストから画像モデルとは大きく異なる可能性があります。

English

Despite the tremendous success of diffusion generative models in text-to-image generation, replicating this success in the domain of image compression has proven difficult. In this paper, we demonstrate that diffusion can significantly improve perceptual quality at a given bit-rate, outperforming state-of-the-art approaches PO-ELIC and HiFiC as measured by FID score. This is achieved using a simple but theoretically motivated two-stage approach combining an autoencoder targeting MSE followed by a further score-based decoder. However, as we will show, implementation details matter and the optimal design decisions can differ greatly from typical text-to-image models.

スコアベース生成モデルを用いた高忠実度画像圧縮

High-Fidelity Image Compression with Score-based Generative Models

要旨

Support