ChatPaper.aiChatPaper

HalluSegBench:面向分割的反事实视觉推理 幻觉评估基准

HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation

June 26, 2025
作者: Xinzhuo Li, Adheesh Juvekar, Xingyou Liu, Muntasir Wahed, Kiet A. Nguyen, Ismini Lourentzou
cs.AI

摘要

近期,视觉-语言分割领域的进展极大地推动了基于视觉的语义理解。然而,这些模型常出现幻觉现象,即对图像中并不存在的物体生成分割掩码,或错误标记无关区域。现有的分割幻觉评估方法主要关注标签或文本层面的幻觉,而未对视觉上下文进行操控,这限制了其诊断关键性失效的能力。为此,我们推出了HalluSegBench,这是首个专门通过反事实视觉推理视角来评估视觉基础中幻觉现象的基准。该基准包含一个由1340对反事实实例组成的新数据集,涵盖281个独特物体类别,以及一套新引入的度量标准,用于量化在视觉连贯场景编辑下的幻觉敏感性。在HalluSegBench上对最先进的视觉-语言分割模型进行的实验表明,视觉驱动的幻觉远比标签驱动的更为普遍,模型常持续进行错误分割,凸显了利用反事实推理来诊断基础忠实性的必要性。
English
Recent progress in vision-language segmentation has significantly advanced grounded visual understanding. However, these models often exhibit hallucinations by producing segmentation masks for objects not grounded in the image content or by incorrectly labeling irrelevant regions. Existing evaluation protocols for segmentation hallucination primarily focus on label or textual hallucinations without manipulating the visual context, limiting their capacity to diagnose critical failures. In response, we introduce HalluSegBench, the first benchmark specifically designed to evaluate hallucinations in visual grounding through the lens of counterfactual visual reasoning. Our benchmark consists of a novel dataset of 1340 counterfactual instance pairs spanning 281 unique object classes, and a set of newly introduced metrics that quantify hallucination sensitivity under visually coherent scene edits. Experiments on HalluSegBench with state-of-the-art vision-language segmentation models reveal that vision-driven hallucinations are significantly more prevalent than label-driven ones, with models often persisting in false segmentation, highlighting the need for counterfactual reasoning to diagnose grounding fidelity.
PDF21July 4, 2025