利用STGG+與主動學習生成π-功能分子
Generating π-Functional Molecules Using STGG+ with Active Learning
February 20, 2025
作者: Alexia Jolicoeur-Martineau, Yan Zhang, Boris Knyazev, Aristide Baratin, Cheng-Hao Liu
cs.AI
摘要
生成具有分佈外特性的新穎分子是分子發現領域中的一項重大挑戰。雖然監督學習方法能夠生成與數據集中相似的高質量分子,但這些方法在泛化至分佈外特性時往往表現不佳。強化學習雖能探索新的化學空間,卻常陷入“獎勵欺騙”的困境,並生成難以合成的分子。在本研究中,我們通過將最先進的監督學習方法STGG+整合到主動學習循環中來解決這一問題。我們的方法迭代地生成、評估並微調STGG+,以持續擴展其知識庫。我們將此方法命名為STGG+AL。我們將STGG+AL應用於有機π功能材料的設計,具體針對兩項具有挑戰性的任務:1)生成以高振子強度為特徵的高吸收性分子;2)設計在近紅外(NIR)範圍內具有合理振子強度的吸收性分子。所生成的分子通過時間依賴的密度泛函理論進行了計算驗證與合理性分析。我們的結果表明,與現有的強化學習(RL)等方法相比,本方法在生成具有高振子強度的新穎分子方面極為有效。我們開源了我們的主動學習代碼,以及包含290萬個π共軛分子的Conjugated-xTB數據集,以及基於sTDA-xTB的振子強度和吸收波長近似計算函數。
English
Generating novel molecules with out-of-distribution properties is a major
challenge in molecular discovery. While supervised learning methods generate
high-quality molecules similar to those in a dataset, they struggle to
generalize to out-of-distribution properties. Reinforcement learning can
explore new chemical spaces but often conducts 'reward-hacking' and generates
non-synthesizable molecules. In this work, we address this problem by
integrating a state-of-the-art supervised learning method, STGG+, in an active
learning loop. Our approach iteratively generates, evaluates, and fine-tunes
STGG+ to continuously expand its knowledge. We denote this approach STGG+AL. We
apply STGG+AL to the design of organic pi-functional materials, specifically
two challenging tasks: 1) generating highly absorptive molecules characterized
by high oscillator strength and 2) designing absorptive molecules with
reasonable oscillator strength in the near-infrared (NIR) range. The generated
molecules are validated and rationalized in-silico with time-dependent density
functional theory. Our results demonstrate that our method is highly effective
in generating novel molecules with high oscillator strength, contrary to
existing methods such as reinforcement learning (RL) methods. We open-source
our active-learning code along with our Conjugated-xTB dataset containing 2.9
million pi-conjugated molecules and the function for approximating the
oscillator strength and absorption wavelength (based on sTDA-xTB).Summary
AI-Generated Summary