FINER: MLLM's Hallucineren bij Fijnmazige Negatieve Vragen

Samenvatting

Multimodale grote taalmodellen (MLLM's) kampen met hallucinaties, vooral bij gedetailleerde (fine-grained) vragen, een uitdaging die onderbelicht blijft in bestaande benchmarks die zich richten op grove, beeldgerelateerde vragen. Wij introduceren FIne-grained NEgative queRies (FINER), samen met twee benchmarks: FINER-CompreCap en FINER-DOCCI. Met FINER analyseren we hallucinaties in vier settings: multi-object, multi-attribute, multi-relation en "what"-vragen. Onze benchmarks tonen aan dat MLLM's hallucineren wanneer gedetailleerde mismatches samenvallen met daadwerkelijk aanwezige elementen in de afbeelding. Om dit aan te pakken, stellen we FINER-Tuning voor, waarbij we Direct Preference Optimization (DPO) toepassen op door FINER geïnspireerde data. Het finetunen van vier toonaangevende MLLM's met FINER-Tuning levert verbeteringen tot 24,2% (InternVL3.5-14B) op tegen hallucinaties volgens onze benchmarks, terwijl tegelijkertijd de prestaties op acht bestaande hallucinatietests verbeteren en de algemene multimodale capaciteiten across zes benchmarks worden versterkt. Code, benchmark en modellen zijn beschikbaar op https://explainableml.github.io/finer-project/.

English

Multimodal large language models (MLLMs) struggle with hallucinations, particularly with fine-grained queries, a challenge underrepresented by existing benchmarks that focus on coarse image-related questions. We introduce FIne-grained NEgative queRies (FINER), alongside two benchmarks: FINER-CompreCap and FINER-DOCCI. Using FINER, we analyze hallucinations across four settings: multi-object, multi-attribute, multi-relation, and ``what'' questions. Our benchmarks reveal that MLLMs hallucinate when fine-grained mismatches co-occur with genuinely present elements in the image. To address this, we propose FINER-Tuning, leveraging Direct Preference Optimization (DPO) on FINER-inspired data. Finetuning four frontier MLLMs with FINER-Tuning yields up to 24.2\% gains (InternVL3.5-14B) on hallucinations from our benchmarks, while simultaneously improving performance on eight existing hallucination suites, and enhancing general multimodal capabilities across six benchmarks. Code, benchmark, and models are available at https://explainableml.github.io/finer-project/{https://explainableml.github.io/finer-project/}.

FINER: MLLM's Hallucineren bij Fijnmazige Negatieve Vragen

FINER: MLLMs Hallucinate under Fine-grained Negative Queries

Samenvatting

Support