선형 분리 가능성의 한계를 넘어서

초록

대부분의 최첨단 시각-언어 모델(VLMs)은 추상적 추론 작업에서 시각적 임베딩의 선형 분리 가능성에 의해 제한되는 것으로 보입니다. 본 연구는 "선형 추론 병목 현상"을 조사하기 위해 VLM의 시각적 임베딩에 대한 간단한 선형 분류기의 성능인 선형 분리 한계(LSC)를 도입합니다. 우리는 이 병목 현상이 널리 퍼져 있으며, 열악한 인식에서 비롯된 것이 아니라 언어 모델의 추론 경로에서의 실패에서 비롯된다는 것을 발견했습니다. 우리는 이것이 해결 가능한 정렬 문제임을 입증합니다. 그러나 필요한 개입은 작업에 따라 다릅니다: 의미적 개념의 경우 기존 경로를 활성화하는 것으로 충분하지만, 복잡한 관계적 추론은 핵심 모델 가중치를 조정해야 합니다. 방법론적 통제로서 포스트픽스 튜닝을 사용하여, 우리는 VLM 내에 강력하지만 잠재된 추론 경로가 존재한다는 강력한 증거를 발견했습니다. 그러나 더 깊은 적응이 필요한 복잡한 관계적 작업의 경우, 표현 품질을 명시적으로 개선하면 임베딩이 여전히 잘 분리되어 있음에도 불구하고 새로운 프롬프트 형식에서 모델이 실패합니다. 궁극적으로, 이 연구는 강력한 추론이 단순히 개선된 표현 학습이 아니라 목표 지향적 정렬의 문제임을 보여주며, VLM 분석을 위한 새로운 관점을 제공합니다.

English

Most state-of-the-art Visual-Language Models (VLMs) are seemingly limited by the linear separabilty of their visual embeddings on abstract reasoning tasks. This work investigates this "linear reasoning bottleneck" by introducing the Linear Separability Ceiling (LSC), the performance of a simple linear classifier on a VLM's visual embeddings. We find this bottleneck is widespread and stems not from poor perception, but from failures in the language model's reasoning pathways. We demonstrate this is a solvable alignment issue. The required intervention, however, is task-dependent: activating existing pathways suffices for semantic concepts, while complex relational reasoning requires adapting core model weights. Using postfix tuning as a methodological control, we find strong evidence for powerful, dormant reasoning pathways within VLMs. However, for complex relational tasks requiring deeper adaptation, explicitly improving representation quality causes the model to fail on new prompt formats despite its embeddings remaining well separated. Ultimately, this work provides a new lens for VLM analysis, showing that robust reasoning is a matter of targeted alignment, not simply improved representation learning.

선형 분리 가능성의 한계를 넘어서

Beyond the Linear Separability Ceiling

초록

Support