Strukturelle Graphabfrage von Vision-Sprache-Modellen

Zusammenfassung

Vision-Language-Modelle (VLMs) erzielen hohe multimodale Leistungsfähigkeit, doch wie Berechnungen über Populationen von Neuronen hinweg organisiert sind, bleibt weitgehend unverstanden. In dieser Arbeit untersuchen wir VLMs durch die Linse der neuronalen Topologie, wobei wir jede Schicht als einen Within-Layer-Korrelationsgraphen darstellen, der aus Neuron-zu-Neuron-Koaktivierungen abgeleitet wird. Diese Betrachtungsweise ermöglicht es uns zu fragen, ob die populationsweite Struktur verhaltensrelevant ist, wie sie sich über Modalitäten und Tiefe hinweg verändert und ob sie kausal einflussreiche interne Komponenten unter Intervention identifiziert. Wir zeigen, dass die Korrelationstopologie ein rekonstruierbares Verhaltenssignal trägt; darüber hinaus konsolidiert sich die cross-modale Struktur mit zunehmender Tiefe fortschreitend um eine kompakte Gruppe rekurrenter Hub-Neuronen, deren gezielte Störung die Modellausgabe erheblich verändert. Neuronale Topologie erweist sich somit als eine bedeutungsvolle intermediäre Skala für die Interpretierbarkeit von VLMs: aussagekräftiger als lokale Attribuierung, handhabbarer als die vollständige Schaltkreisrekonstruktion und empirisch mit multimodalem Verhalten verknüpft. Der Code ist öffentlich verfügbar unter https://github.com/he-h/vlm-graph-probing.

English

Vision-language models (VLMs) achieve strong multimodal performance, yet how computation is organized across populations of neurons remains poorly understood. In this work, we study VLMs through the lens of neural topology, representing each layer as a within-layer correlation graph derived from neuron-neuron co-activations. This view allows us to ask whether population-level structure is behaviorally meaningful, how it changes across modalities and depth, and whether it identifies causally influential internal components under intervention. We show that correlation topology carries recoverable behavioral signal; moreover, cross-modal structure progressively consolidates with depth around a compact set of recurrent hub neurons, whose targeted perturbation substantially alters model output. Neural topology thus emerges as a meaningful intermediate scale for VLM interpretability: richer than local attribution, more tractable than full circuit recovery, and empirically tied to multimodal behavior. Code is publicly available at https://github.com/he-h/vlm-graph-probing.

Strukturelle Graphabfrage von Vision-Sprache-Modellen

Structural Graph Probing of Vision-Language Models

Zusammenfassung

Support