ColorBench:视觉语言模型能否感知并理解多彩世界?一项关于色彩感知、推理与鲁棒性的综合基准测试
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness
April 10, 2025
作者: Yijun Liang, Ming Li, Chenrui Fan, Ziyue Li, Dang Nguyen, Kwesi Cobbina, Shweta Bhardwaj, Jiuhai Chen, Fuxiao Liu, Tianyi Zhou
cs.AI
摘要
色彩在人类感知中扮演着重要角色,通常为视觉推理提供关键线索。然而,视觉-语言模型(VLMs)是否以及如何像人类一样感知、理解并利用色彩,尚不明确。本文介绍了ColorBench,一个精心设计的创新基准,旨在评估VLMs在色彩理解方面的能力,包括色彩感知、推理及鲁棒性。通过构建一系列基于实际应用的多样化测试场景,ColorBench评估了这些模型如何感知色彩、从色彩线索中推断意义,并在不同色彩变换下保持性能一致性。通过对32个采用不同语言模型和视觉编码器的VLMs进行广泛评估,本文揭示了一些未被发现的发现:(i)在ColorBench上,规模法则(模型越大表现越好)依然成立,但语言模型的作用比视觉编码器更为关键。(ii)然而,各模型间的性能差距相对较小,表明现有VLMs在很大程度上忽视了色彩理解。(iii)尽管是视觉中心任务,链式思维(CoT)推理提升了色彩理解的准确性和鲁棒性。(iv)在ColorBench上,VLMs确实利用了色彩线索,但在某些任务中,色彩线索也可能误导模型。这些发现凸显了当前VLMs的关键局限,并强调了增强色彩理解的必要性。我们的ColorBench可作为推动多模态AI实现人类级别色彩理解研究的基础工具。
English
Color plays an important role in human perception and usually provides
critical clues in visual reasoning. However, it is unclear whether and how
vision-language models (VLMs) can perceive, understand, and leverage color as
humans. This paper introduces ColorBench, an innovative benchmark meticulously
crafted to assess the capabilities of VLMs in color understanding, including
color perception, reasoning, and robustness. By curating a suite of diverse
test scenarios, with grounding in real applications, ColorBench evaluates how
these models perceive colors, infer meanings from color-based cues, and
maintain consistent performance under varying color transformations. Through an
extensive evaluation of 32 VLMs with varying language models and vision
encoders, our paper reveals some undiscovered findings: (i) The scaling law
(larger models are better) still holds on ColorBench, while the language model
plays a more important role than the vision encoder. (ii) However, the
performance gaps across models are relatively small, indicating that color
understanding has been largely neglected by existing VLMs. (iii) CoT reasoning
improves color understanding accuracies and robustness, though they are
vision-centric tasks. (iv) Color clues are indeed leveraged by VLMs on
ColorBench but they can also mislead models in some tasks. These findings
highlight the critical limitations of current VLMs and underscore the need to
enhance color comprehension. Our ColorBenchcan serve as a foundational tool for
advancing the study of human-level color understanding of multimodal AI.Summary
AI-Generated Summary