變壓器模型無需圖形先驗知識即可揭示分子結構
Transformers Discover Molecular Structure Without Graph Priors
October 2, 2025
作者: Tobias Kreiman, Yutong Bai, Fadi Atieh, Elizabeth Weaver, Eric Qu, Aditi S. Krishnapriyan
cs.AI
摘要
圖神經網絡(GNNs)是分子機器學習中的主導架構,尤其在分子性質預測和機器學習原子間勢能(MLIPs)方面表現突出。GNNs在預定義的圖上進行信息傳遞,這些圖通常由固定半徑截斷或k近鄰方案誘導生成。儘管這一設計與許多分子任務中的局部性相契合,但硬編碼的圖可能因固定的感受野而限制表達能力,並因稀疏圖操作而減慢推理速度。在本研究中,我們探討了未經修改的純Transformer模型,直接基於笛卡爾座標進行訓練——無需預定義圖或物理先驗——是否能夠近似分子能量和力。作為分析的起點,我們展示了如何在匹配的訓練計算預算下,訓練一個Transformer模型,使其在OMol25數據集上相對於最先進的等變GNN,達到競爭性的能量和力平均絕對誤差。我們發現,Transformer學習到了物理上一致的模式——例如,注意力權重隨原子間距離的倒數衰減——並且由於缺乏硬編碼的偏見,能夠靈活地適應不同的分子環境。使用標準Transformer還能在訓練資源擴展方面帶來可預見的改進,這與在其他領域觀察到的經驗擴展定律一致。我們的結果表明,GNNs的許多優良特性可以在Transformer中自適應地湧現,這挑戰了硬編碼圖歸納偏見的必要性,並指向了標準化、可擴展的分子建模架構。
English
Graph Neural Networks (GNNs) are the dominant architecture for molecular
machine learning, particularly for molecular property prediction and machine
learning interatomic potentials (MLIPs). GNNs perform message passing on
predefined graphs often induced by a fixed radius cutoff or k-nearest neighbor
scheme. While this design aligns with the locality present in many molecular
tasks, a hard-coded graph can limit expressivity due to the fixed receptive
field and slows down inference with sparse graph operations. In this work, we
investigate whether pure, unmodified Transformers trained directly on Cartesian
coordinatesx2013without predefined graphs or physical
priorsx2013can approximate molecular energies and forces. As a
starting point for our analysis, we demonstrate how to train a Transformer to
competitive energy and force mean absolute errors under a matched training
compute budget, relative to a state-of-the-art equivariant GNN on the OMol25
dataset. We discover that the Transformer learns physically consistent
patternsx2013such as attention weights that decay inversely with
interatomic distancex2013and flexibly adapts them across different
molecular environments due to the absence of hard-coded biases. The use of a
standard Transformer also unlocks predictable improvements with respect to
scaling training resources, consistent with empirical scaling laws observed in
other domains. Our results demonstrate that many favorable properties of GNNs
can emerge adaptively in Transformers, challenging the necessity of hard-coded
graph inductive biases and pointing toward standardized, scalable architectures
for molecular modeling.