CC-VQA：基于知识视觉问答中知识冲突消解的冲突与关联感知方法

摘要

基于知识的视觉问答（KB-VQA）在处理知识密集型任务方面展现出巨大潜力。然而，由于视觉语言模型（VLM）中来自预训练的静态参数化知识，其与动态检索信息之间会产生冲突。模型输出往往忽略检索到的上下文，或与参数化知识产生不一致的整合，这为KB-VQA带来了重大挑战。现有的知识冲突缓解方法主要从基于语言的方法改进而来，通过设计提示策略或上下文感知解码机制聚焦于上下文层面的冲突。但这些方法忽视了视觉信息在冲突中的关键作用，且受冗余检索上下文的影响，难以实现精准的冲突识别与有效缓解。为解决上述局限，我们提出CC-VQA：一种面向KB-VQA的新型免训练、冲突与关联感知方法。该方法包含两个核心组件：（1）以视觉为中心的上下文冲突推理，在内部与外部知识语境间进行视觉语义冲突分析；（2）关联引导的编码与解码机制，通过低关联语句的位置编码压缩和基于关联加权的冲突评分自适应解码。在E-VQA、InfoSeek和OK-VQA基准上的大量实验表明，CC-VQA实现了最先进的性能，相比现有方法准确率绝对提升3.3%至6.4%。代码已开源：https://github.com/cqu-student/CC-VQA。

English

Knowledge-based visual question answering (KB-VQA) demonstrates significant potential for handling knowledge-intensive tasks. However, conflicts arise between static parametric knowledge in vision language models (VLMs) and dynamically retrieved information due to the static model knowledge from pre-training. The outputs either ignore retrieved contexts or exhibit inconsistent integration with parametric knowledge, posing substantial challenges for KB-VQA. Current knowledge conflict mitigation methods primarily adapted from language-based approaches, focusing on context-level conflicts through engineered prompting strategies or context-aware decoding mechanisms. However, these methods neglect the critical role of visual information in conflicts and suffer from redundant retrieved contexts, which impair accurate conflict identification and effective mitigation. To address these limitations, we propose CC-VQA: a novel training-free, conflict- and correlation-aware method for KB-VQA. Our method comprises two core components: (1) Vision-Centric Contextual Conflict Reasoning, which performs visual-semantic conflict analysis across internal and external knowledge contexts; and (2) Correlation-Guided Encoding and Decoding, featuring positional encoding compression for low-correlation statements and adaptive decoding using correlation-weighted conflict scoring. Extensive evaluations on E-VQA, InfoSeek, and OK-VQA benchmarks demonstrate that CC-VQA achieves state-of-the-art performance, yielding absolute accuracy improvements of 3.3\% to 6.4\% compared to existing methods. Code is available at https://github.com/cqu-student/CC-VQA.