視覺模型在圖結構理解中被低估的能力
The Underappreciated Power of Vision Models for Graph Structural Understanding
October 27, 2025
作者: Xinjian Zhao, Wei Pang, Zhongkai Xue, Xiangru Jian, Lei Zhang, Yaoyao Xu, Xiaozhuang Song, Shu Wu, Tianshu Yu
cs.AI
摘要
圖神經網絡透過自下而上的訊息傳遞機制運作,這種方式與人類視覺感知存在根本差異——後者會直覺性地先捕捉全局結構。我們探索了視覺模型在圖理解領域尚未被充分發掘的潛力,發現其在經典基準測試中能達到與圖神經網絡相當的性能,同時展現出截然不同的學習模式。這種行為差異,加上現有基準測試中存在的領域特徵與拓撲理解相互混淆的局限性,促使我們推出GraphAbstract基準。該基準評估模型像人類一樣感知全局圖屬性的能力:識別組織原型、檢測對稱性、感知連接強度以及定位關鍵元素。實驗結果表明,在需要整體結構理解的任務上,視覺模型顯著優於圖神經網絡,並能保持跨圖規模的泛化能力;而圖神經網絡則在全局模式抽象方面表現掙扎,且性能隨圖規模增大而衰減。本研究證實視覺模型具有卓越但未被充分利用的圖結構理解能力,特別是在需要全局拓撲感知和尺度不變推理的問題上。這些發現為開發更有效的圖基礎模型開闢了新途徑,尤其適用於以整體模式識別為主導的任務場景。
English
Graph Neural Networks operate through bottom-up message-passing,
fundamentally differing from human visual perception, which intuitively
captures global structures first. We investigate the underappreciated potential
of vision models for graph understanding, finding they achieve performance
comparable to GNNs on established benchmarks while exhibiting distinctly
different learning patterns. These divergent behaviors, combined with
limitations of existing benchmarks that conflate domain features with
topological understanding, motivate our introduction of GraphAbstract. This
benchmark evaluates models' ability to perceive global graph properties as
humans do: recognizing organizational archetypes, detecting symmetry, sensing
connectivity strength, and identifying critical elements. Our results reveal
that vision models significantly outperform GNNs on tasks requiring holistic
structural understanding and maintain generalizability across varying graph
scales, while GNNs struggle with global pattern abstraction and degrade with
increasing graph size. This work demonstrates that vision models possess
remarkable yet underutilized capabilities for graph structural understanding,
particularly for problems requiring global topological awareness and
scale-invariant reasoning. These findings open new avenues to leverage this
underappreciated potential for developing more effective graph foundation
models for tasks dominated by holistic pattern recognition.