ChatPaper.aiChatPaper

FINER:多模态大语言模型在细粒度否定查询下的幻觉现象

FINER: MLLMs Hallucinate under Fine-grained Negative Queries

March 18, 2026
作者: Rui Xiao, Sanghwan Kim, Yongqin Xian, Zeynep Akata, Stephan Alaniz
cs.AI

摘要

多模态大语言模型(MLLMs)普遍存在幻觉问题,尤其在处理细粒度查询时更为突出,而现有基准测试因主要关注粗粒度图像相关问题,未能充分体现这一挑战。我们提出细粒度负向查询框架FINER,并同步发布FINER-CompreCap与FINER-DOCCI两项基准。基于FINER框架,我们从多对象、多属性、多关系及“是什么”问题四类场景系统分析幻觉现象。实验表明,当细粒度失配与图像中真实存在的元素同时出现时,MLLMs极易产生幻觉。为此,我们提出FINER-Tuning方法,通过直接偏好优化(DPO)对FINER启发的数据进行训练。在四个前沿MLLMs上的实验表明,FINER-Tuning使模型在我们的基准测试中幻觉现象减少最高达24.2%(InternVL3.5-14B),同时显著提升八个现有幻觉测试集的性能,并在六个多模态基准上增强通用能力。代码、基准数据及模型均已开源:https://explainableml.github.io/finer-project/。
English
Multimodal large language models (MLLMs) struggle with hallucinations, particularly with fine-grained queries, a challenge underrepresented by existing benchmarks that focus on coarse image-related questions. We introduce FIne-grained NEgative queRies (FINER), alongside two benchmarks: FINER-CompreCap and FINER-DOCCI. Using FINER, we analyze hallucinations across four settings: multi-object, multi-attribute, multi-relation, and ``what'' questions. Our benchmarks reveal that MLLMs hallucinate when fine-grained mismatches co-occur with genuinely present elements in the image. To address this, we propose FINER-Tuning, leveraging Direct Preference Optimization (DPO) on FINER-inspired data. Finetuning four frontier MLLMs with FINER-Tuning yields up to 24.2\% gains (InternVL3.5-14B) on hallucinations from our benchmarks, while simultaneously improving performance on eight existing hallucination suites, and enhancing general multimodal capabilities across six benchmarks. Code, benchmark, and models are available at https://explainableml.github.io/finer-project/{https://explainableml.github.io/finer-project/}.
PDF22March 20, 2026