PerCoV2：暗黙的階層型マスク画像モデリングによる超低ビットレート知覚的画像圧縮の改善

要旨

本論文では、帯域幅とストレージに制約のあるアプリケーション向けに設計された、新規かつオープンな超低ビットレート知覚画像圧縮システム「PerCoV2」を紹介する。PerCoV2は、Careilらによる先行研究を基盤として、元の定式化をStable Diffusion 3エコシステムに拡張し、離散ハイパー潜在画像分布を明示的にモデル化することでエントロピー符号化効率を向上させている。この目的のために、我々は最近の自己回帰手法（VARおよびMaskGIT）をエントロピーモデリングにおいて包括的に比較し、大規模なMSCOCO-30kベンチマークで本手法を評価した。従来の研究と比較して、PerCoV2は、(i)競争力のある知覚品質を維持しながら、さらに低いビットレートでより高い画像忠実度を達成し、(ii)ビットレートをさらに節約するためのハイブリッド生成モードを備え、(iii)公開されているコンポーネントのみで構築されている。コードと学習済みモデルはhttps://github.com/Nikolai10/PerCoV2で公開予定である。

English

We introduce PerCoV2, a novel and open ultra-low bit-rate perceptual image compression system designed for bandwidth- and storage-constrained applications. Building upon prior work by Careil et al., PerCoV2 extends the original formulation to the Stable Diffusion 3 ecosystem and enhances entropy coding efficiency by explicitly modeling the discrete hyper-latent image distribution. To this end, we conduct a comprehensive comparison of recent autoregressive methods (VAR and MaskGIT) for entropy modeling and evaluate our approach on the large-scale MSCOCO-30k benchmark. Compared to previous work, PerCoV2 (i) achieves higher image fidelity at even lower bit-rates while maintaining competitive perceptual quality, (ii) features a hybrid generation mode for further bit-rate savings, and (iii) is built solely on public components. Code and trained models will be released at https://github.com/Nikolai10/PerCoV2.

PerCoV2：暗黙的階層型マスク画像モデリングによる超低ビットレート知覚的画像圧縮の改善

PerCoV2: Improved Ultra-Low Bit-Rate Perceptual Image Compression with Implicit Hierarchical Masked Image Modeling

要旨

Support