CMC-Bench:邁向視覺信號壓縮新範式
CMC-Bench: Towards a New Paradigm of Visual Signal Compression
June 13, 2024
作者: Chunyi Li, Xiele Wu, Haoning Wu, Donghui Feng, Zicheng Zhang, Guo Lu, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, Weisi Lin
cs.AI
摘要
超低位元率圖像壓縮是一個具有挑戰性且要求嚴格的議題。隨著大型多模型(LMMs)的發展,出現了一種圖像-文本-圖像的跨模態壓縮(CMC)範式。與傳統編解碼器相比,這種語義級別的壓縮可以將圖像數據大小降低到0.1\%甚至更低,具有強大的應用潛力。然而,CMC 在與原始圖像的一致性和感知質量方面存在一定缺陷。為了解決這個問題,我們引入了 CMC-Bench,這是一個用於圖像壓縮的圖像到文本(I2T)和文本到圖像(T2I)模型協同性能的基準。這個基準涵蓋了分別驗證了 6 個主流 I2T 和 12 個 T2I 模型的 18,000 和 40,000 張圖像,其中包括 160,000 個由人類專家標註的主觀偏好分數。在超低位元率下,本文證明了某些 I2T 和 T2I 模型的組合已經超越了最先進的視覺信號編解碼器;同時,本文突出了 LMMs 可以進一步優化以應對壓縮任務的方向。我們鼓勵 LMM 開發人員參與此測試,以促進視覺信號編解碼協議的演進。
English
Ultra-low bitrate image compression is a challenging and demanding topic.
With the development of Large Multimodal Models (LMMs), a Cross Modality
Compression (CMC) paradigm of Image-Text-Image has emerged. Compared with
traditional codecs, this semantic-level compression can reduce image data size
to 0.1\% or even lower, which has strong potential applications. However, CMC
has certain defects in consistency with the original image and perceptual
quality. To address this problem, we introduce CMC-Bench, a benchmark of the
cooperative performance of Image-to-Text (I2T) and Text-to-Image (T2I) models
for image compression. This benchmark covers 18,000 and 40,000 images
respectively to verify 6 mainstream I2T and 12 T2I models, including 160,000
subjective preference scores annotated by human experts. At ultra-low bitrates,
this paper proves that the combination of some I2T and T2I models has surpassed
the most advanced visual signal codecs; meanwhile, it highlights where LMMs can
be further optimized toward the compression task. We encourage LMM developers
to participate in this test to promote the evolution of visual signal codec
protocols.