ChatPaper.aiChatPaper

CMC-Bench:邁向視覺信號壓縮新範式

CMC-Bench: Towards a New Paradigm of Visual Signal Compression

June 13, 2024
作者: Chunyi Li, Xiele Wu, Haoning Wu, Donghui Feng, Zicheng Zhang, Guo Lu, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, Weisi Lin
cs.AI

摘要

超低位元率圖像壓縮是一個具有挑戰性且要求嚴格的議題。隨著大型多模型(LMMs)的發展,出現了一種圖像-文本-圖像的跨模態壓縮(CMC)範式。與傳統編解碼器相比,這種語義級別的壓縮可以將圖像數據大小降低到0.1\%甚至更低,具有強大的應用潛力。然而,CMC 在與原始圖像的一致性和感知質量方面存在一定缺陷。為了解決這個問題,我們引入了 CMC-Bench,這是一個用於圖像壓縮的圖像到文本(I2T)和文本到圖像(T2I)模型協同性能的基準。這個基準涵蓋了分別驗證了 6 個主流 I2T 和 12 個 T2I 模型的 18,000 和 40,000 張圖像,其中包括 160,000 個由人類專家標註的主觀偏好分數。在超低位元率下,本文證明了某些 I2T 和 T2I 模型的組合已經超越了最先進的視覺信號編解碼器;同時,本文突出了 LMMs 可以進一步優化以應對壓縮任務的方向。我們鼓勵 LMM 開發人員參與此測試,以促進視覺信號編解碼協議的演進。
English
Ultra-low bitrate image compression is a challenging and demanding topic. With the development of Large Multimodal Models (LMMs), a Cross Modality Compression (CMC) paradigm of Image-Text-Image has emerged. Compared with traditional codecs, this semantic-level compression can reduce image data size to 0.1\% or even lower, which has strong potential applications. However, CMC has certain defects in consistency with the original image and perceptual quality. To address this problem, we introduce CMC-Bench, a benchmark of the cooperative performance of Image-to-Text (I2T) and Text-to-Image (T2I) models for image compression. This benchmark covers 18,000 and 40,000 images respectively to verify 6 mainstream I2T and 12 T2I models, including 160,000 subjective preference scores annotated by human experts. At ultra-low bitrates, this paper proves that the combination of some I2T and T2I models has surpassed the most advanced visual signal codecs; meanwhile, it highlights where LMMs can be further optimized toward the compression task. We encourage LMM developers to participate in this test to promote the evolution of visual signal codec protocols.
PDF52December 6, 2024