Graph Mamba: 状態空間モデルを用いたグラフ学習に向けて

要旨

グラフニューラルネットワーク（GNNs）は、グラフ表現学習において有望な可能性を示しています。ほとんどのGNNは、複数の層を積み重ねることでグラフ上で情報を伝播する局所的なメッセージパッシングメカニズムを定義しています。しかし、これらの手法は、過剰な圧縮（over-squashing）と長距離依存関係の捕捉が不十分という2つの主要な制限に悩まされることが知られています。最近、グラフトランスフォーマー（GTs）がメッセージパッシングニューラルネットワーク（MPNNs）の強力な代替手段として登場しました。しかし、GTsは二次的な計算コストを持ち、グラフ構造に対する帰納的バイアスが欠如しており、複雑な位置/構造エンコーディング（SE/PE）に依存しています。本論文では、トランスフォーマー、複雑なメッセージパッシング、およびSE/PEが実際に良好な性能を発揮するために十分である一方、いずれも必要ではないことを示します。Mambaなどの状態空間モデル（SSMs）の最近の成功に触発され、選択的SSMsに基づく新しいクラスのGNNのための一般的なフレームワークであるGraph Mamba Networks（GMNs）を提案します。グラフ構造化データにSSMsを適用する際の新たな課題について議論し、GMNsを設計するために必要な4つのステップと1つのオプションのステップを提示します。具体的には、(1) 近傍トークン化、(2) トークン順序付け、(3) 双方向選択的SSMエンコーダのアーキテクチャ、(4) 局所エンコーディング、および省略可能な(5) PEおよびSEを選択します。さらに、GMNsの力を理論的に正当化します。実験結果は、GMNsがはるかに少ない計算コストにもかかわらず、長距離、小規模、大規模、および異質性のあるベンチマークデータセットで優れた性能を発揮することを示しています。

English

Graph Neural Networks (GNNs) have shown promising potential in graph representation learning. The majority of GNNs define a local message-passing mechanism, propagating information over the graph by stacking multiple layers. These methods, however, are known to suffer from two major limitations: over-squashing and poor capturing of long-range dependencies. Recently, Graph Transformers (GTs) emerged as a powerful alternative to Message-Passing Neural Networks (MPNNs). GTs, however, have quadratic computational cost, lack inductive biases on graph structures, and rely on complex Positional/Structural Encodings (SE/PE). In this paper, we show that while Transformers, complex message-passing, and SE/PE are sufficient for good performance in practice, neither is necessary. Motivated by the recent success of State Space Models (SSMs), such as Mamba, we present Graph Mamba Networks (GMNs), a general framework for a new class of GNNs based on selective SSMs. We discuss and categorize the new challenges when adopting SSMs to graph-structured data, and present four required and one optional steps to design GMNs, where we choose (1) Neighborhood Tokenization, (2) Token Ordering, (3) Architecture of Bidirectional Selective SSM Encoder, (4) Local Encoding, and dispensable (5) PE and SE. We further provide theoretical justification for the power of GMNs. Experiments demonstrate that despite much less computational cost, GMNs attain an outstanding performance in long-range, small-scale, large-scale, and heterophilic benchmark datasets.

Graph Mamba: 状態空間モデルを用いたグラフ学習に向けて

Graph Mamba: Towards Learning on Graphs with State Space Models

要旨

Support