Transformerはグラフ事前知識なしで分子構造を発見する

要旨

グラフニューラルネットワーク（GNN）は、分子機械学習、特に分子特性予測や機械学習原子間ポテンシャル（MLIP）において支配的なアーキテクチャです。GNNは、固定半径カットオフやk近傍法によって誘導された事前定義されたグラフ上でメッセージパッシングを行います。この設計は多くの分子タスクに存在する局所性と一致していますが、ハードコードされたグラフは固定された受容野により表現力を制限し、疎なグラフ操作により推論を遅くする可能性があります。本研究では、事前定義されたグラフや物理的な事前知識なしに、デカルト座標に直接訓練された純粋なTransformerが分子のエネルギーと力を近似できるかどうかを調査します。分析の出発点として、OMol25データセットにおいて、最先端の等変性GNNと同等の訓練計算予算のもとで、Transformerが競争力のあるエネルギーと力の平均絶対誤差を達成する方法を示します。Transformerが、原子間距離に反比例して減衰するアテンション重みなど、物理的に一貫したパターンを学習し、ハードコードされたバイアスがないため、異なる分子環境に柔軟に適応することを発見しました。標準的なTransformerの使用は、他の領域で観察された経験的なスケーリング則と一致して、訓練リソースのスケーリングに関して予測可能な改善を可能にします。我々の結果は、GNNの多くの有利な特性がTransformerにおいて適応的に現れる可能性を示しており、ハードコードされたグラフ帰納バイアスの必要性に疑問を投げかけ、分子モデリングのための標準化されたスケーラブルなアーキテクチャの方向性を示しています。

English

Graph Neural Networks (GNNs) are the dominant architecture for molecular machine learning, particularly for molecular property prediction and machine learning interatomic potentials (MLIPs). GNNs perform message passing on predefined graphs often induced by a fixed radius cutoff or k-nearest neighbor scheme. While this design aligns with the locality present in many molecular tasks, a hard-coded graph can limit expressivity due to the fixed receptive field and slows down inference with sparse graph operations. In this work, we investigate whether pure, unmodified Transformers trained directly on Cartesian coordinatesx2013without predefined graphs or physical priorsx2013can approximate molecular energies and forces. As a starting point for our analysis, we demonstrate how to train a Transformer to competitive energy and force mean absolute errors under a matched training compute budget, relative to a state-of-the-art equivariant GNN on the OMol25 dataset. We discover that the Transformer learns physically consistent patternsx2013such as attention weights that decay inversely with interatomic distancex2013and flexibly adapts them across different molecular environments due to the absence of hard-coded biases. The use of a standard Transformer also unlocks predictable improvements with respect to scaling training resources, consistent with empirical scaling laws observed in other domains. Our results demonstrate that many favorable properties of GNNs can emerge adaptively in Transformers, challenging the necessity of hard-coded graph inductive biases and pointing toward standardized, scalable architectures for molecular modeling.

Transformerはグラフ事前知識なしで分子構造を発見する

Transformers Discover Molecular Structure Without Graph Priors

要旨

Support