視覺模型在圖結構理解中被低估的能力

摘要

圖神經網絡透過自下而上的訊息傳遞機制運作，這種方式與人類視覺感知存在根本差異——後者會直覺性地先捕捉全局結構。我們探索了視覺模型在圖理解領域尚未被充分發掘的潛力，發現其在經典基準測試中能達到與圖神經網絡相當的性能，同時展現出截然不同的學習模式。這種行為差異，加上現有基準測試中存在的領域特徵與拓撲理解相互混淆的局限性，促使我們推出GraphAbstract基準。該基準評估模型像人類一樣感知全局圖屬性的能力：識別組織原型、檢測對稱性、感知連接強度以及定位關鍵元素。實驗結果表明，在需要整體結構理解的任務上，視覺模型顯著優於圖神經網絡，並能保持跨圖規模的泛化能力；而圖神經網絡則在全局模式抽象方面表現掙扎，且性能隨圖規模增大而衰減。本研究證實視覺模型具有卓越但未被充分利用的圖結構理解能力，特別是在需要全局拓撲感知和尺度不變推理的問題上。這些發現為開發更有效的圖基礎模型開闢了新途徑，尤其適用於以整體模式識別為主導的任務場景。

English

Graph Neural Networks operate through bottom-up message-passing, fundamentally differing from human visual perception, which intuitively captures global structures first. We investigate the underappreciated potential of vision models for graph understanding, finding they achieve performance comparable to GNNs on established benchmarks while exhibiting distinctly different learning patterns. These divergent behaviors, combined with limitations of existing benchmarks that conflate domain features with topological understanding, motivate our introduction of GraphAbstract. This benchmark evaluates models' ability to perceive global graph properties as humans do: recognizing organizational archetypes, detecting symmetry, sensing connectivity strength, and identifying critical elements. Our results reveal that vision models significantly outperform GNNs on tasks requiring holistic structural understanding and maintain generalizability across varying graph scales, while GNNs struggle with global pattern abstraction and degrade with increasing graph size. This work demonstrates that vision models possess remarkable yet underutilized capabilities for graph structural understanding, particularly for problems requiring global topological awareness and scale-invariant reasoning. These findings open new avenues to leverage this underappreciated potential for developing more effective graph foundation models for tasks dominated by holistic pattern recognition.

視覺模型在圖結構理解中被低估的能力

The Underappreciated Power of Vision Models for Graph Structural Understanding

摘要

Support