超越感知谬误：大型视觉语言模型中的语义固化现象

摘要

大型視覺語言模型（VLMs）常依賴熟悉的語義先驗知識，但現有評估方法未能清晰區分感知失誤與規則映射失誤。我們將這種行為定義為語義固化現象：即使提示明確指定了另一種同等有效的映射關係，模型仍堅持默認的語義解釋。為分離此效應，我們提出VLM-Fix基準測試，通過四款抽象策略遊戲對比評估相同終局棋盤狀態在標準規則與逆向規則下的表現。在14個開源與閉源VLM的測試中，模型準確率始終偏向標準規則，顯現出穩固的語義固化差距。提示干預實驗佐證了該機制：使用中性別名提示能顯著縮小逆向規則差距，而具語義負載的別名則重新擴大差距。後訓練實驗顯示模型具有強規則對齊特性：單一規則訓練可提升同規則遷移能力卻損害異規則遷移，聯合規則訓練則能增強泛化遷移能力。為驗證合成遊戲之外的外部效度，我們在VLMBias數據集上實施類似的去熟悉化干預，觀察到一致的定性規律。最後，通過對後續層激活向量的定向調控可部分恢復性能退化，表明語義固化誤差至少可通過後期表徵編輯進行修正。項目頁面、代碼及數據集詳見https://maveryn.github.io/vlm-fix/。

English

Large vision-language models (VLMs) often rely on familiar semantic priors, but existing evaluations do not cleanly separate perception failures from rule-mapping failures. We study this behavior as semantic fixation: preserving a default interpretation even when the prompt specifies an alternative, equally valid mapping. To isolate this effect, we introduce VLM-Fix, a controlled benchmark over four abstract strategy games that evaluates identical terminal board states under paired standard and inverse rule formulations. Across 14 open and closed VLMs, accuracy consistently favors standard rules, revealing a robust semantic-fixation gap. Prompt interventions support this mechanism: neutral alias prompts substantially narrow the inverse-rule gap, while semantically loaded aliases reopen it. Post-training is strongly rule-aligned: training on one rule improves same-rule transfer but hurts opposite-rule transfer, while joint-rule training improves broader transfer. To test external validity beyond synthetic games, we evaluate analogous defamiliarization interventions on VLMBias and observe the same qualitative pattern. Finally, late-layer activation steering partially recovers degraded performance, indicating that semantic-fixation errors are at least partly editable in late representations. Project page, code, and dataset available at https://maveryn.github.io/vlm-fix/.

超越感知谬误：大型视觉语言模型中的语义固化现象

Beyond Perception Errors: Semantic Fixation in Large Vision-Language Models

摘要

Support