MVI-Bench:评估LVLM对误导性视觉输入鲁棒性的综合基准
MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs
November 18, 2025
作者: Huiyi Chen, Jiawei Peng, Dehai Min, Changchang Sun, Kaijie Chen, Yan Yan, Xu Yang, Lu Cheng
cs.AI
摘要
评估大规模视觉语言模型(LVLMs)的鲁棒性对其持续发展和在现实应用中的负责任部署至关重要。然而,现有的鲁棒性基准测试通常聚焦于幻觉或误导性文本输入,而在评估视觉理解能力时,很大程度上忽视了误导性视觉输入带来的同等关键挑战。为填补这一重要空白,我们推出首个综合性基准测试MVI-Bench,专门用于评估误导性视觉输入如何削弱LVLMs的鲁棒性。基于基础视觉基元,MVI-Bench的设计围绕误导性视觉输入的三个层次展开:视觉概念、视觉属性和视觉关系。基于此分类体系,我们筛选出六个代表性类别,并编制了1,248个经专业标注的视觉问答实例。为支持细粒度鲁棒性评估,我们进一步提出MVI-Sensitivity——一种能在微观层面表征LVLM鲁棒性的创新指标。对18个前沿LVLM的实证研究揭示了模型对误导性视觉输入的显著脆弱性,而基于MVI-Bench的深度分析为开发更可靠、更鲁棒的LVLM提供了可操作的指导见解。基准测试与代码库可通过https://github.com/chenyil6/MVI-Bench获取。
English
Evaluating the robustness of Large Vision-Language Models (LVLMs) is essential for their continued development and responsible deployment in real-world applications. However, existing robustness benchmarks typically focus on hallucination or misleading textual inputs, while largely overlooking the equally critical challenge posed by misleading visual inputs in assessing visual understanding. To fill this important gap, we introduce MVI-Bench, the first comprehensive benchmark specially designed for evaluating how Misleading Visual Inputs undermine the robustness of LVLMs. Grounded in fundamental visual primitives, the design of MVI-Bench centers on three hierarchical levels of misleading visual inputs: Visual Concept, Visual Attribute, and Visual Relationship. Using this taxonomy, we curate six representative categories and compile 1,248 expertly annotated VQA instances. To facilitate fine-grained robustness evaluation, we further introduce MVI-Sensitivity, a novel metric that characterizes LVLM robustness at a granular level. Empirical results across 18 state-of-the-art LVLMs uncover pronounced vulnerabilities to misleading visual inputs, and our in-depth analyses on MVI-Bench provide actionable insights that can guide the development of more reliable and robust LVLMs. The benchmark and codebase can be accessed at https://github.com/chenyil6/MVI-Bench.