超越线性可分性的极限

摘要

大多數最先進的視覺-語言模型（VLMs）在抽象推理任務上似乎受到其視覺嵌入線性可分離性的限制。本研究通過引入線性可分離性上限（LSC），即簡單線性分類器在VLM視覺嵌入上的表現，來探討這一“線性推理瓶頸”。我們發現這一瓶頸普遍存在，且並非源於感知能力不足，而是來自語言模型推理路徑的缺陷。我們證明這是一個可解決的對齊問題。然而，所需的干預措施依任務而異：對於語義概念，激活現有路徑已足夠；而複雜的關係推理則需要調整模型的核心權重。使用後綴調諧作為方法學控制，我們發現了VLMs內部存在強大但休眠的推理路徑的有力證據。然而，對於需要更深層次適應的複雜關係任務，顯著提升表示質量會導致模型在新的提示格式上失敗，儘管其嵌入仍保持良好分離。最終，這項工作為VLM分析提供了新的視角，表明穩健的推理是針對性對齊的問題，而不僅僅是改進表示學習。

English

Most state-of-the-art Visual-Language Models (VLMs) are seemingly limited by the linear separabilty of their visual embeddings on abstract reasoning tasks. This work investigates this "linear reasoning bottleneck" by introducing the Linear Separability Ceiling (LSC), the performance of a simple linear classifier on a VLM's visual embeddings. We find this bottleneck is widespread and stems not from poor perception, but from failures in the language model's reasoning pathways. We demonstrate this is a solvable alignment issue. The required intervention, however, is task-dependent: activating existing pathways suffices for semantic concepts, while complex relational reasoning requires adapting core model weights. Using postfix tuning as a methodological control, we find strong evidence for powerful, dormant reasoning pathways within VLMs. However, for complex relational tasks requiring deeper adaptation, explicitly improving representation quality causes the model to fail on new prompt formats despite its embeddings remaining well separated. Ultimately, this work provides a new lens for VLM analysis, showing that robust reasoning is a matter of targeted alignment, not simply improved representation learning.

超越线性可分性的极限

Beyond the Linear Separability Ceiling

摘要

Support