비전-언어 모델의 구조적 그래프 프로빙

초록

비전-언어 모델(VLMs)은 강력한 다중 모달 성능을 달성하지만, 신경망 집단 전체에서 계산이 어떻게 조직되는지는 여전히 잘 이해되지 않고 있습니다. 본 연구에서는 신경 토폴로지 관점에서 VLM을 분석하며, 각 계층을 신경원 간 공동 활성화에서 도출된 계층 내 상관관계 그래프로 표현합니다. 이러한 시각을 통해 집단 수준 구조가 행동적으로 의미 있는지, 모달리티와 계층 깊이에 따라 어떻게 변화하는지, 그리고 중재 하에서 인과적으로 영향력 있는 내부 구성 요소를 식별하는지에 대한 질문을 던질 수 있습니다. 우리는 상관관계 토폴로지가 복원 가능한 행동 신호를 포함함을 보여주며, 더 나아가 교차 모달 구조가 컴팩트한 반복 허브 신경원 집합을 중심으로 깊이에 따라 점차 통합되는 것을 확인했습니다. 이러한 허브 신경원을 대상으로 하는 교란은 모델 출력을 크게 변화시킵니다. 따라서 신경 토폴로지는 VLM 해석 가능성을 위한 의미 있는 중간 규모로 부상합니다. 이는 지역적 귀속보다 풍부하고, 완전한 회로 복원보다 다루기 쉬우며, 실증적으로 다중 모달 행동과 연결됩니다. 코드는 https://github.com/he-h/vlm-graph-probing에서 공개되었습니다.

English

Vision-language models (VLMs) achieve strong multimodal performance, yet how computation is organized across populations of neurons remains poorly understood. In this work, we study VLMs through the lens of neural topology, representing each layer as a within-layer correlation graph derived from neuron-neuron co-activations. This view allows us to ask whether population-level structure is behaviorally meaningful, how it changes across modalities and depth, and whether it identifies causally influential internal components under intervention. We show that correlation topology carries recoverable behavioral signal; moreover, cross-modal structure progressively consolidates with depth around a compact set of recurrent hub neurons, whose targeted perturbation substantially alters model output. Neural topology thus emerges as a meaningful intermediate scale for VLM interpretability: richer than local attribution, more tractable than full circuit recovery, and empirically tied to multimodal behavior. Code is publicly available at https://github.com/he-h/vlm-graph-probing.

비전-언어 모델의 구조적 그래프 프로빙

Structural Graph Probing of Vision-Language Models

초록

Support