Transformers无需图结构先验即可揭示分子结构

摘要

图神经网络（GNNs）是分子机器学习的主导架构，尤其在分子性质预测和机器学习原子间势能（MLIPs）方面表现突出。GNNs在预定义的图上执行消息传递，这些图通常由固定半径截断或k近邻方案生成。尽管这种设计契合了许多分子任务中的局部性特征，但硬编码的图由于固定的感受野可能限制表达能力，并通过稀疏图操作减缓推理速度。在本研究中，我们探讨了未经修改的纯Transformer模型，直接基于笛卡尔坐标训练——无需预定义图或物理先验——是否能够近似分子能量和力。作为分析的起点，我们展示了如何在匹配的训练计算预算下，使Transformer在OMol25数据集上达到与最先进的等变GNN相竞争的能量和力平均绝对误差。我们发现，Transformer学习到了物理上一致的规律——例如注意力权重随原子间距离的倒数衰减——并且由于缺乏硬编码的偏置，能够灵活适应不同的分子环境。使用标准Transformer还带来了训练资源扩展时预测性的改进，这与在其他领域观察到的经验扩展定律一致。我们的结果表明，GNNs的许多优良特性可以在Transformer中自适应地涌现，挑战了硬编码图归纳偏置的必要性，并指向了标准化、可扩展的分子建模架构。

English

Graph Neural Networks (GNNs) are the dominant architecture for molecular machine learning, particularly for molecular property prediction and machine learning interatomic potentials (MLIPs). GNNs perform message passing on predefined graphs often induced by a fixed radius cutoff or k-nearest neighbor scheme. While this design aligns with the locality present in many molecular tasks, a hard-coded graph can limit expressivity due to the fixed receptive field and slows down inference with sparse graph operations. In this work, we investigate whether pure, unmodified Transformers trained directly on Cartesian coordinatesx2013without predefined graphs or physical priorsx2013can approximate molecular energies and forces. As a starting point for our analysis, we demonstrate how to train a Transformer to competitive energy and force mean absolute errors under a matched training compute budget, relative to a state-of-the-art equivariant GNN on the OMol25 dataset. We discover that the Transformer learns physically consistent patternsx2013such as attention weights that decay inversely with interatomic distancex2013and flexibly adapts them across different molecular environments due to the absence of hard-coded biases. The use of a standard Transformer also unlocks predictable improvements with respect to scaling training resources, consistent with empirical scaling laws observed in other domains. Our results demonstrate that many favorable properties of GNNs can emerge adaptively in Transformers, challenging the necessity of hard-coded graph inductive biases and pointing toward standardized, scalable architectures for molecular modeling.

Transformers无需图结构先验即可揭示分子结构

Transformers Discover Molecular Structure Without Graph Priors

摘要

Support