知覚エラーを超えて：大規模視覚言語モデルにおける意味的固定化

要旨

大規模視覚言語モデル（VLM）は、しばしば既知の意味論的先験に依存するが、既存の評価では知覚失敗と規則マッピング失敗が明確に区別されていない。我々はこの振る舞いを**意味論的固定化**として研究する：プロンプトが代替の同等に有効なマッピングを指定している場合でも、デフォルトの解釈を保持する現象である。この効果を分離するため、我々は**VLM-Fix**を導入する。これは4つの抽象戦略ゲームからなる制御ベンチマークであり、同一の最終盤面を、対応する標準規則と逆規則の定式化下で評価する。14のオープン及びクローズドVLMにわたり、精度は一貫して標準規則で有利となり、頑健な意味論的固定化ギャップが明らかになった。プロンプト介入はこのメカニズムを支持する：中立的な別名プロンプトは逆規則ギャップを大幅に狭めるが、意味論的に負荷された別名はそれを再び拡大する。学習後調整は強く規則整合的である：一つの規則で学習すると同一規則への転移は向上するが、逆規則への転移は悪化し、両規則の共同学習はより広範な転移を向上させる。合成ゲームを超えた外的妥当性を検証するため、VLMBiasに対して類似的な脱慣習化介入を評価し、同じ質的パターンを観察した。最後に、後期層アクティベーション steering は劣化した性能を部分的に回復し、意味論的固定化誤りが後期表現において少なくとも部分的に編集可能であることを示唆する。プロジェクトページ、コード、データセットは https://maveryn.github.io/vlm-fix/ で利用可能。

English

Large vision-language models (VLMs) often rely on familiar semantic priors, but existing evaluations do not cleanly separate perception failures from rule-mapping failures. We study this behavior as semantic fixation: preserving a default interpretation even when the prompt specifies an alternative, equally valid mapping. To isolate this effect, we introduce VLM-Fix, a controlled benchmark over four abstract strategy games that evaluates identical terminal board states under paired standard and inverse rule formulations. Across 14 open and closed VLMs, accuracy consistently favors standard rules, revealing a robust semantic-fixation gap. Prompt interventions support this mechanism: neutral alias prompts substantially narrow the inverse-rule gap, while semantically loaded aliases reopen it. Post-training is strongly rule-aligned: training on one rule improves same-rule transfer but hurts opposite-rule transfer, while joint-rule training improves broader transfer. To test external validity beyond synthetic games, we evaluate analogous defamiliarization interventions on VLMBias and observe the same qualitative pattern. Finally, late-layer activation steering partially recovers degraded performance, indicating that semantic-fixation errors are at least partly editable in late representations. Project page, code, and dataset available at https://maveryn.github.io/vlm-fix/.

知覚エラーを超えて：大規模視覚言語モデルにおける意味的固定化

Beyond Perception Errors: Semantic Fixation in Large Vision-Language Models

要旨

Support