透過立體電子融入的分子圖,推進分子機器(學習)表示法
Advancing Molecular Machine (Learned) Representations with Stereoelectronics-Infused Molecular Graphs
August 8, 2024
作者: Daniil A. Boiko, Thiago Reschützegger, Benjamin Sanchez-Lengeling, Samuel M. Blau, Gabe Gomes
cs.AI
摘要
分子表示是我們理解物理世界的基礎元素。它的重要性從化學反應的基本原理到新療法和材料的設計都有涵蓋。先前的分子機器學習模型採用了字串、指紋、全局特徵和簡單的分子圖,這些都是內在信息稀疏的表示形式。然而,隨著預測任務的複雜度增加,分子表示需要編碼更高保真度的信息。本研究引入了一種新方法,通過立體電子效應將量子化學豐富信息注入到分子圖中。我們展示了明確添加立體電子相互作用顯著提高了分子機器學習模型的性能。此外,通過定制的雙圖神經網絡工作流程,可以學習和應用立體電子注入的表示形式,從而使其應用於任何下游分子機器學習任務。最後,我們展示了學習到的表示形式允許對先前難以處理的系統進行方便的立體電子評估,例如整個蛋白質,開啟了分子設計的新途徑。
English
Molecular representation is a foundational element in our understanding of
the physical world. Its importance ranges from the fundamentals of chemical
reactions to the design of new therapies and materials. Previous molecular
machine learning models have employed strings, fingerprints, global features,
and simple molecular graphs that are inherently information-sparse
representations. However, as the complexity of prediction tasks increases, the
molecular representation needs to encode higher fidelity information. This work
introduces a novel approach to infusing quantum-chemical-rich information into
molecular graphs via stereoelectronic effects. We show that the explicit
addition of stereoelectronic interactions significantly improves the
performance of molecular machine learning models. Furthermore,
stereoelectronics-infused representations can be learned and deployed with a
tailored double graph neural network workflow, enabling its application to any
downstream molecular machine learning task. Finally, we show that the learned
representations allow for facile stereoelectronic evaluation of previously
intractable systems, such as entire proteins, opening new avenues of molecular
design.Summary
AI-Generated Summary