超越线性可分性极限

摘要

当前最先进的视觉-语言模型（VLMs）在抽象推理任务上似乎受限于其视觉嵌入的线性可分性。本研究通过引入线性可分性上限（LSC）——即简单线性分类器在VLM视觉嵌入上的表现——来探究这一“线性推理瓶颈”。我们发现这一瓶颈普遍存在，其根源并非感知能力不足，而是语言模型推理路径的缺陷。我们证明这是一个可解决的校准问题。然而，所需的干预措施因任务而异：对于语义概念，激活现有路径已足够；而复杂的关联推理则需要调整核心模型权重。通过使用后缀调优作为方法学控制，我们发现了VLMs内存在强大但休眠的推理路径的有力证据。然而，对于需要更深层次调整的复杂关联任务，尽管嵌入保持良好分离，但明确提升表示质量会导致模型在新提示格式上失败。最终，本研究为VLM分析提供了新的视角，表明稳健的推理关键在于针对性校准，而不仅仅是改进表示学习。

English

Most state-of-the-art Visual-Language Models (VLMs) are seemingly limited by the linear separabilty of their visual embeddings on abstract reasoning tasks. This work investigates this "linear reasoning bottleneck" by introducing the Linear Separability Ceiling (LSC), the performance of a simple linear classifier on a VLM's visual embeddings. We find this bottleneck is widespread and stems not from poor perception, but from failures in the language model's reasoning pathways. We demonstrate this is a solvable alignment issue. The required intervention, however, is task-dependent: activating existing pathways suffices for semantic concepts, while complex relational reasoning requires adapting core model weights. Using postfix tuning as a methodological control, we find strong evidence for powerful, dormant reasoning pathways within VLMs. However, for complex relational tasks requiring deeper adaptation, explicitly improving representation quality causes the model to fail on new prompt formats despite its embeddings remaining well separated. Ultimately, this work provides a new lens for VLM analysis, showing that robust reasoning is a matter of targeted alignment, not simply improved representation learning.

超越线性可分性极限

Beyond the Linear Separability Ceiling

摘要

Support