STGG+と能動学習を用いたπ機能性分子の生成

要旨

分布外の特性を持つ新規分子の生成は、分子発見における主要な課題です。教師あり学習手法はデータセット内の分子に類似した高品質な分子を生成しますが、分布外の特性への一般化には苦戦します。強化学習は新しい化学空間を探索できますが、しばしば「報酬ハッキング」を行い、合成不可能な分子を生成してしまいます。本研究では、最先端の教師あり学習手法であるSTGG+を能動学習ループに統合することでこの問題に取り組みます。我々のアプローチは、STGG+を反復的に生成、評価、微調整し、その知識を継続的に拡張します。このアプローチをSTGG+ALと称します。STGG+ALを有機π機能性材料の設計に適用し、特に以下の2つの挑戦的なタスクに焦点を当てます：1）高振動子強度を特徴とする高吸収性分子の生成、2）近赤外線（NIR）領域で適切な振動子強度を持つ吸収性分子の設計。生成された分子は、時間依存密度汎関数理論を用いてin-silicoで検証および合理化されます。我々の結果は、強化学習（RL）手法などの既存の手法とは対照的に、本手法が高振動子強度を持つ新規分子の生成に極めて有効であることを示しています。我々は、能動学習コードと、290万のπ共役分子を含むConjugated-xTBデータセット、および振動子強度と吸収波長を近似する関数（sTDA-xTBに基づく）をオープンソースとして公開します。

English

Generating novel molecules with out-of-distribution properties is a major challenge in molecular discovery. While supervised learning methods generate high-quality molecules similar to those in a dataset, they struggle to generalize to out-of-distribution properties. Reinforcement learning can explore new chemical spaces but often conducts 'reward-hacking' and generates non-synthesizable molecules. In this work, we address this problem by integrating a state-of-the-art supervised learning method, STGG+, in an active learning loop. Our approach iteratively generates, evaluates, and fine-tunes STGG+ to continuously expand its knowledge. We denote this approach STGG+AL. We apply STGG+AL to the design of organic pi-functional materials, specifically two challenging tasks: 1) generating highly absorptive molecules characterized by high oscillator strength and 2) designing absorptive molecules with reasonable oscillator strength in the near-infrared (NIR) range. The generated molecules are validated and rationalized in-silico with time-dependent density functional theory. Our results demonstrate that our method is highly effective in generating novel molecules with high oscillator strength, contrary to existing methods such as reinforcement learning (RL) methods. We open-source our active-learning code along with our Conjugated-xTB dataset containing 2.9 million pi-conjugated molecules and the function for approximating the oscillator strength and absorption wavelength (based on sTDA-xTB).

STGG+と能動学習を用いたπ機能性分子の生成

Generating π-Functional Molecules Using STGG+ with Active Learning

要旨

Support