지각 오류를 넘어서: 대규모 시각-언어 모델의 의미론적 고정 현상

초록

대규모 시각-언어 모델(VLM)은 종종 익숙한 의미론적 사전 지식에 의존하지만, 기존 평가 방식은 인식 실패와 규칙 매핑 실패를 명확히 구분하지 않습니다. 우리는 이러한 현상을 **의미론적 고정**으로 연구합니다. 이는 프롬프트가 동등하게 유효한 대안적 매핑을 명시하더라도 기본 해석을 유지하는 현상을 의미합니다. 이 효과를 분리하기 위해 우리는 VLM-Fix를 도입했습니다. 이는 4가지 추상 전략 게임으로 구성된 통제된 벤치마크로, 동일한 최종 보드 상태를 표준 규칙과 반대 규칙 쌍 하에서 평가합니다. 14개의 오픈 및 클로즈드 VLM에 걸쳐 정확도는 일관되게 표준 규칙에 유리하게 나타나, 강력한 의미론적 고정 격차를 드러냈습니다. 프롬프트 개입은 이 메커니즘을 지지합니다: 중립적인 별칭 프롬프트는 반대 규칙 격차를 상당히 좁히는 반면, 의미론적으로 부담되는 별칭은 이를 다시 벌립니다. 사후 학습은 규칙에 강하게 정렬됩니다: 한 규칙에 대한 학습은 동일 규칙 전이를 개선하지만 반대 규칙 전이를 해치며, 결합 규칙 학습은 더 넓은 전이를 개선합니다. 합성 게임을 넘어 외적 타당성을 검증하기 위해 VLMBias에 대한 유사한 탈친숙화 개입을 평가하고 동일한 정성적 패턴을 관찰했습니다. 마지막으로, 후반부 계층 활성화 조정은 저하된 성능을 부분적으로 회복시켜, 의미론적 고정 오류가 적어도 후기 표현에서 부분적으로 편집 가능함을 시사합니다. 프로젝트 페이지, 코드 및 데이터셋은 https://maveryn.github.io/vlm-fix/에서 확인할 수 있습니다.

English

Large vision-language models (VLMs) often rely on familiar semantic priors, but existing evaluations do not cleanly separate perception failures from rule-mapping failures. We study this behavior as semantic fixation: preserving a default interpretation even when the prompt specifies an alternative, equally valid mapping. To isolate this effect, we introduce VLM-Fix, a controlled benchmark over four abstract strategy games that evaluates identical terminal board states under paired standard and inverse rule formulations. Across 14 open and closed VLMs, accuracy consistently favors standard rules, revealing a robust semantic-fixation gap. Prompt interventions support this mechanism: neutral alias prompts substantially narrow the inverse-rule gap, while semantically loaded aliases reopen it. Post-training is strongly rule-aligned: training on one rule improves same-rule transfer but hurts opposite-rule transfer, while joint-rule training improves broader transfer. To test external validity beyond synthetic games, we evaluate analogous defamiliarization interventions on VLMBias and observe the same qualitative pattern. Finally, late-layer activation steering partially recovers degraded performance, indicating that semantic-fixation errors are at least partly editable in late representations. Project page, code, and dataset available at https://maveryn.github.io/vlm-fix/.

지각 오류를 넘어서: 대규모 시각-언어 모델의 의미론적 고정 현상

Beyond Perception Errors: Semantic Fixation in Large Vision-Language Models

초록

Support