CC-VQA：基于知识的视觉问答中缓解知识冲突的冲突与关联感知方法

摘要

基於知識的視覺問答（KB-VQA）在處理知識密集型任務方面展現出巨大潛力。然而，由於預訓練獲得的靜態模型知識，視覺語言模型（VLM）中的靜態參數化知識與動態檢索信息之間會產生衝突。模型輸出要麼忽略檢索上下文，要麼表現出與參數化知識整合不一致的問題，這對KB-VQA構成了重大挑戰。現有的知識衝突緩解方法主要改編自基於語言的方案，通過設計提示策略或上下文感知解碼機制來處理上下文層面的衝突。但這些方法忽視了視覺信息在衝突中的關鍵作用，且受冗餘檢索上下文的影響，難以實現精準的衝突識別與有效緩解。為解決這些局限性，我們提出CC-VQA：一種面向KB-VQA的無訓練、衝突與關聯感知新方法。該方法包含兩個核心組件：（1）以視覺為中心的上下文衝突推理，在內部與外部知識上下文間進行視覺-語義衝突分析；（2）關聯引導的編碼與解碼機制，通過對低關聯語句進行位置編碼壓縮，並採用基於關聯加權衝突評分的自適應解碼。在E-VQA、InfoSeek和OK-VQA基準上的大量實驗表明，CC-VQA實現了最優性能，相比現有方法絕對準確率提升3.3%至6.4%。代碼已開源於：https://github.com/cqu-student/CC-VQA。

English

Knowledge-based visual question answering (KB-VQA) demonstrates significant potential for handling knowledge-intensive tasks. However, conflicts arise between static parametric knowledge in vision language models (VLMs) and dynamically retrieved information due to the static model knowledge from pre-training. The outputs either ignore retrieved contexts or exhibit inconsistent integration with parametric knowledge, posing substantial challenges for KB-VQA. Current knowledge conflict mitigation methods primarily adapted from language-based approaches, focusing on context-level conflicts through engineered prompting strategies or context-aware decoding mechanisms. However, these methods neglect the critical role of visual information in conflicts and suffer from redundant retrieved contexts, which impair accurate conflict identification and effective mitigation. To address these limitations, we propose CC-VQA: a novel training-free, conflict- and correlation-aware method for KB-VQA. Our method comprises two core components: (1) Vision-Centric Contextual Conflict Reasoning, which performs visual-semantic conflict analysis across internal and external knowledge contexts; and (2) Correlation-Guided Encoding and Decoding, featuring positional encoding compression for low-correlation statements and adaptive decoding using correlation-weighted conflict scoring. Extensive evaluations on E-VQA, InfoSeek, and OK-VQA benchmarks demonstrate that CC-VQA achieves state-of-the-art performance, yielding absolute accuracy improvements of 3.3\% to 6.4\% compared to existing methods. Code is available at https://github.com/cqu-student/CC-VQA.

CC-VQA：基于知识的视觉问答中缓解知识冲突的冲突与关联感知方法

CC-VQA: Conflict- and Correlation-Aware Method for Mitigating Knowledge Conflict in Knowledge-Based Visual Question Answering

摘要

Support