利用立体电子学注入的分子图,推进分子机器(学习)表示。
Advancing Molecular Machine (Learned) Representations with Stereoelectronics-Infused Molecular Graphs
August 8, 2024
作者: Daniil A. Boiko, Thiago Reschützegger, Benjamin Sanchez-Lengeling, Samuel M. Blau, Gabe Gomes
cs.AI
摘要
分子表示是我们理解物质世界的基础要素。它的重要性涵盖了从化学反应基础到新疗法和材料设计的方方面面。先前的分子机器学习模型采用了字符串、指纹、全局特征和简单的分子图,这些都是固有的信息稀疏表示。然而,随着预测任务复杂性的增加,分子表示需要编码更高保真度的信息。本研究引入了一种新颖的方法,通过立体电子效应将量子化学丰富信息融入分子图中。我们展示了明确添加立体电子相互作用显著提高了分子机器学习模型的性能。此外,融入立体电子的表示可以通过定制的双图神经网络工作流程进行学习和部署,从而使其应用于任何下游分子机器学习任务。最后,我们展示了学习到的表示允许对先前难以处理的系统(如整个蛋白质)进行方便的立体电子评估,开辟了分子设计的新途径。
English
Molecular representation is a foundational element in our understanding of
the physical world. Its importance ranges from the fundamentals of chemical
reactions to the design of new therapies and materials. Previous molecular
machine learning models have employed strings, fingerprints, global features,
and simple molecular graphs that are inherently information-sparse
representations. However, as the complexity of prediction tasks increases, the
molecular representation needs to encode higher fidelity information. This work
introduces a novel approach to infusing quantum-chemical-rich information into
molecular graphs via stereoelectronic effects. We show that the explicit
addition of stereoelectronic interactions significantly improves the
performance of molecular machine learning models. Furthermore,
stereoelectronics-infused representations can be learned and deployed with a
tailored double graph neural network workflow, enabling its application to any
downstream molecular machine learning task. Finally, we show that the learned
representations allow for facile stereoelectronic evaluation of previously
intractable systems, such as entire proteins, opening new avenues of molecular
design.Summary
AI-Generated Summary