使用基於分數的生成模型進行高保真度圖像壓縮

摘要

儘管擴散生成模型在文本轉圖像生成方面取得了巨大成功，但在圖像壓縮領域複製這一成功卻顯得困難。本文中，我們證明了擴散在給定比特率下可以顯著提高感知質量，並通過 FID 分數的評估超越了最先進的方法 PO-ELIC 和 HiFiC。我們採用了一種簡單但在理論上有動機的兩階段方法，該方法結合了針對 MSE 的自編碼器，然後是進一步基於分數的解碼器。然而，正如我們將展示的，實現細節至關重要，最佳設計決策可能與典型的文本轉圖像模型大相徑庭。

English

Despite the tremendous success of diffusion generative models in text-to-image generation, replicating this success in the domain of image compression has proven difficult. In this paper, we demonstrate that diffusion can significantly improve perceptual quality at a given bit-rate, outperforming state-of-the-art approaches PO-ELIC and HiFiC as measured by FID score. This is achieved using a simple but theoretically motivated two-stage approach combining an autoencoder targeting MSE followed by a further score-based decoder. However, as we will show, implementation details matter and the optimal design decisions can differ greatly from typical text-to-image models.

使用基於分數的生成模型進行高保真度圖像壓縮

High-Fidelity Image Compression with Score-based Generative Models

摘要

Support