CMC-Bench：邁向視覺信號壓縮新範式

摘要

超低位元率圖像壓縮是一個具有挑戰性且要求嚴格的議題。隨著大型多模型（LMMs）的發展，出現了一種圖像-文本-圖像的跨模態壓縮（CMC）範式。與傳統編解碼器相比，這種語義級別的壓縮可以將圖像數據大小降低到0.1\%甚至更低，具有強大的應用潛力。然而，CMC 在與原始圖像的一致性和感知質量方面存在一定缺陷。為了解決這個問題，我們引入了 CMC-Bench，這是一個用於圖像壓縮的圖像到文本（I2T）和文本到圖像（T2I）模型協同性能的基準。這個基準涵蓋了分別驗證了 6 個主流 I2T 和 12 個 T2I 模型的 18,000 和 40,000 張圖像，其中包括 160,000 個由人類專家標註的主觀偏好分數。在超低位元率下，本文證明了某些 I2T 和 T2I 模型的組合已經超越了最先進的視覺信號編解碼器；同時，本文突出了 LMMs 可以進一步優化以應對壓縮任務的方向。我們鼓勵 LMM 開發人員參與此測試，以促進視覺信號編解碼協議的演進。

English

Ultra-low bitrate image compression is a challenging and demanding topic. With the development of Large Multimodal Models (LMMs), a Cross Modality Compression (CMC) paradigm of Image-Text-Image has emerged. Compared with traditional codecs, this semantic-level compression can reduce image data size to 0.1\% or even lower, which has strong potential applications. However, CMC has certain defects in consistency with the original image and perceptual quality. To address this problem, we introduce CMC-Bench, a benchmark of the cooperative performance of Image-to-Text (I2T) and Text-to-Image (T2I) models for image compression. This benchmark covers 18,000 and 40,000 images respectively to verify 6 mainstream I2T and 12 T2I models, including 160,000 subjective preference scores annotated by human experts. At ultra-low bitrates, this paper proves that the combination of some I2T and T2I models has surpassed the most advanced visual signal codecs; meanwhile, it highlights where LMMs can be further optimized toward the compression task. We encourage LMM developers to participate in this test to promote the evolution of visual signal codec protocols.

CMC-Bench：邁向視覺信號壓縮新範式

CMC-Bench: Towards a New Paradigm of Visual Signal Compression

摘要

Support