视觉语言模型的结构化图探测

摘要

视觉语言模型（VLMs）在多模态任务中表现出卓越性能，但神经元群体间如何组织计算仍鲜为人知。本研究通过神经拓扑的视角分析VLMs，将每个网络层表示为基于神经元共激活的层内相关性图。这一视角使我们得以探究：群体层级结构是否具有行为意义、该结构如何随模态与深度变化、以及干预下能否识别出具有因果影响力的内部组件。我们发现相关性拓扑携带可还原的行为信号；更重要的是，跨模态结构会随着网络深度围绕一组紧凑的循环枢纽神经元逐步整合，对这些枢纽节点的定向扰动会显著改变模型输出。神经拓扑由此成为VLM可解释性研究的重要中间尺度：它比局部归因更丰富，比完整回路重建更易处理，且与多模态行为存在实证关联。代码已开源：https://github.com/he-h/vlm-graph-probing。

English

Vision-language models (VLMs) achieve strong multimodal performance, yet how computation is organized across populations of neurons remains poorly understood. In this work, we study VLMs through the lens of neural topology, representing each layer as a within-layer correlation graph derived from neuron-neuron co-activations. This view allows us to ask whether population-level structure is behaviorally meaningful, how it changes across modalities and depth, and whether it identifies causally influential internal components under intervention. We show that correlation topology carries recoverable behavioral signal; moreover, cross-modal structure progressively consolidates with depth around a compact set of recurrent hub neurons, whose targeted perturbation substantially alters model output. Neural topology thus emerges as a meaningful intermediate scale for VLM interpretability: richer than local attribution, more tractable than full circuit recovery, and empirically tied to multimodal behavior. Code is publicly available at https://github.com/he-h/vlm-graph-probing.