CMC-Bench:走向视觉信号压缩的新范式
CMC-Bench: Towards a New Paradigm of Visual Signal Compression
June 13, 2024
作者: Chunyi Li, Xiele Wu, Haoning Wu, Donghui Feng, Zicheng Zhang, Guo Lu, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, Weisi Lin
cs.AI
摘要
超低比特率图像压缩是一个具有挑战性和需求量大的课题。随着大型多模型(LMMs)的发展,出现了一种图像-文本-图像的跨模态压缩(CMC)范式。与传统编解码器相比,这种语义级别的压缩可以将图像数据大小减少到0.1\%甚至更低,具有强大的潜在应用。然而,CMC在与原始图像的一致性和感知质量方面存在一定缺陷。为了解决这个问题,我们引入了CMC-Bench,一个评估图像到文本(I2T)和文本到图像(T2I)模型合作性能的基准。该基准涵盖了分别用于验证6种主流I2T和12种T2I模型的18,000和40,000张图像,其中包括由人类专家注释的160,000个主观偏好分数。在超低比特率下,本文证明了一些I2T和T2I模型的组合已经超越了最先进的视觉信号编解码器;同时,突出了LMMs在压缩任务中可以进一步优化的方向。我们鼓励LMM开发者参与此测试,以推动视觉信号编解码器协议的演进。
English
Ultra-low bitrate image compression is a challenging and demanding topic.
With the development of Large Multimodal Models (LMMs), a Cross Modality
Compression (CMC) paradigm of Image-Text-Image has emerged. Compared with
traditional codecs, this semantic-level compression can reduce image data size
to 0.1\% or even lower, which has strong potential applications. However, CMC
has certain defects in consistency with the original image and perceptual
quality. To address this problem, we introduce CMC-Bench, a benchmark of the
cooperative performance of Image-to-Text (I2T) and Text-to-Image (T2I) models
for image compression. This benchmark covers 18,000 and 40,000 images
respectively to verify 6 mainstream I2T and 12 T2I models, including 160,000
subjective preference scores annotated by human experts. At ultra-low bitrates,
this paper proves that the combination of some I2T and T2I models has surpassed
the most advanced visual signal codecs; meanwhile, it highlights where LMMs can
be further optimized toward the compression task. We encourage LMM developers
to participate in this test to promote the evolution of visual signal codec
protocols.