nabla^2DFT：一個包含類似藥物分子的通用量子化學數據集，並作為神經網絡潛力的基準。

摘要

計算量子化學方法提供了準確的分子性質近似值，對於計算輔助藥物發現和化學科學的其他領域至關重要。然而，高計算複雜度限制了其應用的可擴展性。神經網絡勢（NNPs）是量子化學方法的一個有前途的替代方案，但它們需要大量和多樣化的數據集進行訓練。本研究提出了一個基於nablaDFT的新數據集和基準，名為nabla^2DFT。它包含兩倍於分子結構、三倍於構象、新的數據類型和任務，以及最先進的模型。該數據集包括能量、力、17個分子性質、哈密頓和重疊矩陣，以及一個波函數對象。所有計算均在每個構象的DFT水平（omegaB97X-D/def2-SVP）下進行。此外，nabla^2DFT是第一個包含大量類似藥物分子鬆弛軌跡的數據集。我們還引入了一個新的基準，用於評估NNPs在分子性質預測、哈密頓預測和構象優化任務中的表現。最後，我們提出了一個可擴展的框架，用於訓練NNPs，並在其中實現了10個模型。

English

Methods of computational quantum chemistry provide accurate approximations of molecular properties crucial for computer-aided drug discovery and other areas of chemical science. However, high computational complexity limits the scalability of their applications. Neural network potentials (NNPs) are a promising alternative to quantum chemistry methods, but they require large and diverse datasets for training. This work presents a new dataset and benchmark called nabla^2DFT that is based on the nablaDFT. It contains twice as much molecular structures, three times more conformations, new data types and tasks, and state-of-the-art models. The dataset includes energies, forces, 17 molecular properties, Hamiltonian and overlap matrices, and a wavefunction object. All calculations were performed at the DFT level (omegaB97X-D/def2-SVP) for each conformation. Moreover, nabla^2DFT is the first dataset that contains relaxation trajectories for a substantial number of drug-like molecules. We also introduce a novel benchmark for evaluating NNPs in molecular property prediction, Hamiltonian prediction, and conformational optimization tasks. Finally, we propose an extendable framework for training NNPs and implement 10 models within it.

nabla^2DFT：一個包含類似藥物分子的通用量子化學數據集，並作為神經網絡潛力的基準。

nabla^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials

摘要

Support