视觉模型在图结构理解中被低估的力量

摘要

图神经网络通过自底向上的消息传递机制运作，这与人类视觉感知存在根本差异——后者能够直觉性地先捕捉整体结构。我们探索了视觉模型在图结构理解中未被充分重视的潜力，发现其在经典基准测试中能达到与图神经网络相媲美的性能，同时展现出截然不同的学习模式。这种差异性行为，加上现有基准测试将领域特征与拓扑理解相混淆的局限性，促使我们推出GraphAbstract基准。该基准通过识别组织原型、检测对称性、感知连接强度及定位关键元素等任务，评估模型像人类一样感知全局图属性的能力。实验结果表明：在需要整体结构理解的任务中，视觉模型显著优于图神经网络，并能在不同图规模下保持泛化能力；而图神经网络则难以进行全局模式抽象，且性能随图规模增大而下降。本研究表明视觉模型具有卓越但未被充分利用的图结构理解能力，尤其适用于需要全局拓扑感知和尺度不变推理的问题。这些发现为开发更有效的图基础模型开辟了新途径，特别适用于以整体模式识别为主导的任务场景。

English

Graph Neural Networks operate through bottom-up message-passing, fundamentally differing from human visual perception, which intuitively captures global structures first. We investigate the underappreciated potential of vision models for graph understanding, finding they achieve performance comparable to GNNs on established benchmarks while exhibiting distinctly different learning patterns. These divergent behaviors, combined with limitations of existing benchmarks that conflate domain features with topological understanding, motivate our introduction of GraphAbstract. This benchmark evaluates models' ability to perceive global graph properties as humans do: recognizing organizational archetypes, detecting symmetry, sensing connectivity strength, and identifying critical elements. Our results reveal that vision models significantly outperform GNNs on tasks requiring holistic structural understanding and maintain generalizability across varying graph scales, while GNNs struggle with global pattern abstraction and degrade with increasing graph size. This work demonstrates that vision models possess remarkable yet underutilized capabilities for graph structural understanding, particularly for problems requiring global topological awareness and scale-invariant reasoning. These findings open new avenues to leverage this underappreciated potential for developing more effective graph foundation models for tasks dominated by holistic pattern recognition.

视觉模型在图结构理解中被低估的力量

The Underappreciated Power of Vision Models for Graph Structural Understanding

摘要

Support