使用基於分數的生成模型進行高保真度圖像壓縮
High-Fidelity Image Compression with Score-based Generative Models
May 26, 2023
作者: Emiel Hoogeboom, Eirikur Agustsson, Fabian Mentzer, Luca Versari, George Toderici, Lucas Theis
cs.AI
摘要
儘管擴散生成模型在文本轉圖像生成方面取得了巨大成功,但在圖像壓縮領域複製這一成功卻顯得困難。本文中,我們證明了擴散在給定比特率下可以顯著提高感知質量,並通過 FID 分數的評估超越了最先進的方法 PO-ELIC 和 HiFiC。我們採用了一種簡單但在理論上有動機的兩階段方法,該方法結合了針對 MSE 的自編碼器,然後是進一步基於分數的解碼器。然而,正如我們將展示的,實現細節至關重要,最佳設計決策可能與典型的文本轉圖像模型大相徑庭。
English
Despite the tremendous success of diffusion generative models in
text-to-image generation, replicating this success in the domain of image
compression has proven difficult. In this paper, we demonstrate that diffusion
can significantly improve perceptual quality at a given bit-rate, outperforming
state-of-the-art approaches PO-ELIC and HiFiC as measured by FID score. This is
achieved using a simple but theoretically motivated two-stage approach
combining an autoencoder targeting MSE followed by a further score-based
decoder. However, as we will show, implementation details matter and the
optimal design decisions can differ greatly from typical text-to-image models.