视觉语言模型的结构化图探针研究

摘要

视觉语言模型（VLM）在多模态任务中表现出卓越性能，但其神经元群体间的计算组织机制仍不明确。本研究通过神经拓扑视角分析VLM，将每个网络层表示为基于神经元共激活的层内相关性图谱。该方法使我们能探究三个核心问题：群体层级结构是否具有行为意义？该结构如何随模态和深度变化？能否通过干预识别因果性内部组件？实验表明相关性拓扑蕴含可解码的行为信号；更重要的是，跨模态结构会随深度增加逐渐围绕一组紧凑的循环枢纽神经元整合，针对这些枢纽节点的定向扰动将显著改变模型输出。神经拓扑由此成为VLM可解释性的重要中介尺度：比局部归因更丰富，比完整回路重建更易处理，且与多模态行为存在实证关联。代码已开源：https://github.com/he-h/vlm-graph-probing。

English

Vision-language models (VLMs) achieve strong multimodal performance, yet how computation is organized across populations of neurons remains poorly understood. In this work, we study VLMs through the lens of neural topology, representing each layer as a within-layer correlation graph derived from neuron-neuron co-activations. This view allows us to ask whether population-level structure is behaviorally meaningful, how it changes across modalities and depth, and whether it identifies causally influential internal components under intervention. We show that correlation topology carries recoverable behavioral signal; moreover, cross-modal structure progressively consolidates with depth around a compact set of recurrent hub neurons, whose targeted perturbation substantially alters model output. Neural topology thus emerges as a meaningful intermediate scale for VLM interpretability: richer than local attribution, more tractable than full circuit recovery, and empirically tied to multimodal behavior. Code is publicly available at https://github.com/he-h/vlm-graph-probing.

视觉语言模型的结构化图探针研究

Structural Graph Probing of Vision-Language Models

摘要

Support