nabla^2DFT:一個包含類似藥物分子的通用量子化學數據集,並作為神經網絡潛力的基準。
nabla^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials
June 20, 2024
作者: Kuzma Khrabrov, Anton Ber, Artem Tsypin, Konstantin Ushenin, Egor Rumiantsev, Alexander Telepov, Dmitry Protasov, Ilya Shenbin, Anton Alekseev, Mikhail Shirokikh, Sergey Nikolenko, Elena Tutubalina, Artur Kadurin
cs.AI
摘要
計算量子化學方法提供了準確的分子性質近似值,對於計算輔助藥物發現和化學科學的其他領域至關重要。然而,高計算複雜度限制了其應用的可擴展性。神經網絡勢(NNPs)是量子化學方法的一個有前途的替代方案,但它們需要大量和多樣化的數據集進行訓練。本研究提出了一個基於nablaDFT的新數據集和基準,名為nabla^2DFT。它包含兩倍於分子結構、三倍於構象、新的數據類型和任務,以及最先進的模型。該數據集包括能量、力、17個分子性質、哈密頓和重疊矩陣,以及一個波函數對象。所有計算均在每個構象的DFT水平(omegaB97X-D/def2-SVP)下進行。此外,nabla^2DFT是第一個包含大量類似藥物分子鬆弛軌跡的數據集。我們還引入了一個新的基準,用於評估NNPs在分子性質預測、哈密頓預測和構象優化任務中的表現。最後,我們提出了一個可擴展的框架,用於訓練NNPs,並在其中實現了10個模型。
English
Methods of computational quantum chemistry provide accurate approximations of
molecular properties crucial for computer-aided drug discovery and other areas
of chemical science. However, high computational complexity limits the
scalability of their applications. Neural network potentials (NNPs) are a
promising alternative to quantum chemistry methods, but they require large and
diverse datasets for training. This work presents a new dataset and benchmark
called nabla^2DFT that is based on the nablaDFT. It contains twice as much
molecular structures, three times more conformations, new data types and tasks,
and state-of-the-art models. The dataset includes energies, forces, 17
molecular properties, Hamiltonian and overlap matrices, and a wavefunction
object. All calculations were performed at the DFT level
(omegaB97X-D/def2-SVP) for each conformation. Moreover, nabla^2DFT is the
first dataset that contains relaxation trajectories for a substantial number of
drug-like molecules. We also introduce a novel benchmark for evaluating NNPs in
molecular property prediction, Hamiltonian prediction, and conformational
optimization tasks. Finally, we propose an extendable framework for training
NNPs and implement 10 models within it.Summary
AI-Generated Summary