视觉模型在图结构理解中被低估的力量
The Underappreciated Power of Vision Models for Graph Structural Understanding
October 27, 2025
作者: Xinjian Zhao, Wei Pang, Zhongkai Xue, Xiangru Jian, Lei Zhang, Yaoyao Xu, Xiaozhuang Song, Shu Wu, Tianshu Yu
cs.AI
摘要
图神经网络通过自底向上的消息传递机制运作,这与人类视觉感知存在根本差异——后者能够直觉性地先捕捉整体结构。我们探索了视觉模型在图结构理解中未被充分重视的潜力,发现其在经典基准测试中能达到与图神经网络相媲美的性能,同时展现出截然不同的学习模式。这种差异性行为,加上现有基准测试将领域特征与拓扑理解相混淆的局限性,促使我们推出GraphAbstract基准。该基准通过识别组织原型、检测对称性、感知连接强度及定位关键元素等任务,评估模型像人类一样感知全局图属性的能力。实验结果表明:在需要整体结构理解的任务中,视觉模型显著优于图神经网络,并能在不同图规模下保持泛化能力;而图神经网络则难以进行全局模式抽象,且性能随图规模增大而下降。本研究表明视觉模型具有卓越但未被充分利用的图结构理解能力,尤其适用于需要全局拓扑感知和尺度不变推理的问题。这些发现为开发更有效的图基础模型开辟了新途径,特别适用于以整体模式识别为主导的任务场景。
English
Graph Neural Networks operate through bottom-up message-passing,
fundamentally differing from human visual perception, which intuitively
captures global structures first. We investigate the underappreciated potential
of vision models for graph understanding, finding they achieve performance
comparable to GNNs on established benchmarks while exhibiting distinctly
different learning patterns. These divergent behaviors, combined with
limitations of existing benchmarks that conflate domain features with
topological understanding, motivate our introduction of GraphAbstract. This
benchmark evaluates models' ability to perceive global graph properties as
humans do: recognizing organizational archetypes, detecting symmetry, sensing
connectivity strength, and identifying critical elements. Our results reveal
that vision models significantly outperform GNNs on tasks requiring holistic
structural understanding and maintain generalizability across varying graph
scales, while GNNs struggle with global pattern abstraction and degrade with
increasing graph size. This work demonstrates that vision models possess
remarkable yet underutilized capabilities for graph structural understanding,
particularly for problems requiring global topological awareness and
scale-invariant reasoning. These findings open new avenues to leverage this
underappreciated potential for developing more effective graph foundation
models for tasks dominated by holistic pattern recognition.