ChatPaper.aiChatPaper

CMC-Bench:走向视觉信号压缩的新范式

CMC-Bench: Towards a New Paradigm of Visual Signal Compression

June 13, 2024
作者: Chunyi Li, Xiele Wu, Haoning Wu, Donghui Feng, Zicheng Zhang, Guo Lu, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, Weisi Lin
cs.AI

摘要

超低比特率图像压缩是一个具有挑战性和需求量大的课题。随着大型多模型(LMMs)的发展,出现了一种图像-文本-图像的跨模态压缩(CMC)范式。与传统编解码器相比,这种语义级别的压缩可以将图像数据大小减少到0.1\%甚至更低,具有强大的潜在应用。然而,CMC在与原始图像的一致性和感知质量方面存在一定缺陷。为了解决这个问题,我们引入了CMC-Bench,一个评估图像到文本(I2T)和文本到图像(T2I)模型合作性能的基准。该基准涵盖了分别用于验证6种主流I2T和12种T2I模型的18,000和40,000张图像,其中包括由人类专家注释的160,000个主观偏好分数。在超低比特率下,本文证明了一些I2T和T2I模型的组合已经超越了最先进的视觉信号编解码器;同时,突出了LMMs在压缩任务中可以进一步优化的方向。我们鼓励LMM开发者参与此测试,以推动视觉信号编解码器协议的演进。
English
Ultra-low bitrate image compression is a challenging and demanding topic. With the development of Large Multimodal Models (LMMs), a Cross Modality Compression (CMC) paradigm of Image-Text-Image has emerged. Compared with traditional codecs, this semantic-level compression can reduce image data size to 0.1\% or even lower, which has strong potential applications. However, CMC has certain defects in consistency with the original image and perceptual quality. To address this problem, we introduce CMC-Bench, a benchmark of the cooperative performance of Image-to-Text (I2T) and Text-to-Image (T2I) models for image compression. This benchmark covers 18,000 and 40,000 images respectively to verify 6 mainstream I2T and 12 T2I models, including 160,000 subjective preference scores annotated by human experts. At ultra-low bitrates, this paper proves that the combination of some I2T and T2I models has surpassed the most advanced visual signal codecs; meanwhile, it highlights where LMMs can be further optimized toward the compression task. We encourage LMM developers to participate in this test to promote the evolution of visual signal codec protocols.
PDF52December 6, 2024