nabla^2DFT: 創薬候補分子の普遍的な量子化学データセットとニューラルネットワークポテンシャルのベンチマーク

要旨

計算量子化学の手法は、コンピュータ支援型創薬や化学科学の他の分野において重要な分子特性の正確な近似を提供します。しかし、高い計算複雑さがその応用のスケーラビリティを制限しています。ニューラルネットワークポテンシャル（NNPs）は量子化学手法の有望な代替手段ですが、その訓練には大規模で多様なデータセットが必要です。本研究では、nablaDFTに基づいた新しいデータセットとベンチマークであるnabla^2DFTを紹介します。このデータセットは、2倍の分子構造、3倍のコンフォメーション、新しいデータタイプとタスク、そして最先端のモデルを含んでいます。データセットには、エネルギー、力、17の分子特性、ハミルトニアン行列と重なり行列、および波動関数オブジェクトが含まれています。すべての計算は、各コンフォメーションに対してDFTレベル（omegaB97X-D/def2-SVP）で行われました。さらに、nabla^2DFTは、相当数の創薬類似分子に対する緩和軌道を含む初めてのデータセットです。また、分子特性予測、ハミルトニアン予測、およびコンフォメーション最適化タスクにおけるNNPsの評価のための新しいベンチマークを導入します。最後に、NNPsの訓練のための拡張可能なフレームワークを提案し、その中に10のモデルを実装しました。

English

Methods of computational quantum chemistry provide accurate approximations of molecular properties crucial for computer-aided drug discovery and other areas of chemical science. However, high computational complexity limits the scalability of their applications. Neural network potentials (NNPs) are a promising alternative to quantum chemistry methods, but they require large and diverse datasets for training. This work presents a new dataset and benchmark called nabla^2DFT that is based on the nablaDFT. It contains twice as much molecular structures, three times more conformations, new data types and tasks, and state-of-the-art models. The dataset includes energies, forces, 17 molecular properties, Hamiltonian and overlap matrices, and a wavefunction object. All calculations were performed at the DFT level (omegaB97X-D/def2-SVP) for each conformation. Moreover, nabla^2DFT is the first dataset that contains relaxation trajectories for a substantial number of drug-like molecules. We also introduce a novel benchmark for evaluating NNPs in molecular property prediction, Hamiltonian prediction, and conformational optimization tasks. Finally, we propose an extendable framework for training NNPs and implement 10 models within it.

nabla^2DFT: 創薬候補分子の普遍的な量子化学データセットとニューラルネットワークポテンシャルのベンチマーク

nabla^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials

要旨

Support