ChatPaper.aiChatPaper

nabla^2DFT:一种包含药物样分子的通用量子化学数据集,同时也是神经网络势函数的基准测试。

nabla^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials

June 20, 2024
作者: Kuzma Khrabrov, Anton Ber, Artem Tsypin, Konstantin Ushenin, Egor Rumiantsev, Alexander Telepov, Dmitry Protasov, Ilya Shenbin, Anton Alekseev, Mikhail Shirokikh, Sergey Nikolenko, Elena Tutubalina, Artur Kadurin
cs.AI

摘要

计算量子化学方法提供了对分子性质的准确近似,这对于计算辅助药物发现和化学科学的其他领域至关重要。然而,高计算复杂性限制了其应用的可扩展性。神经网络势(NNPs)是量子化学方法的一种有前途的替代方案,但它们需要大量和多样化的数据集进行训练。本研究提出了一个基于nablaDFT的新数据集和基准测试,名为nabla^2DFT。它包含两倍数量的分子结构、三倍数量的构象、新的数据类型和任务,以及最先进的模型。该数据集包括能量、力、17个分子性质、哈密顿量和重叠矩阵,以及一个波函数对象。所有计算均在每个构象的DFT水平(omegaB97X-D/def2-SVP)上执行。此外,nabla^2DFT是第一个包含大量类药物分子弛豫轨迹的数据集。我们还引入了一个新颖的基准测试,用于评估NNPs在分子性质预测、哈密顿量预测和构象优化任务中的表现。最后,我们提出了一个可扩展的框架用于训练NNPs,并在其中实现了10个模型。
English
Methods of computational quantum chemistry provide accurate approximations of molecular properties crucial for computer-aided drug discovery and other areas of chemical science. However, high computational complexity limits the scalability of their applications. Neural network potentials (NNPs) are a promising alternative to quantum chemistry methods, but they require large and diverse datasets for training. This work presents a new dataset and benchmark called nabla^2DFT that is based on the nablaDFT. It contains twice as much molecular structures, three times more conformations, new data types and tasks, and state-of-the-art models. The dataset includes energies, forces, 17 molecular properties, Hamiltonian and overlap matrices, and a wavefunction object. All calculations were performed at the DFT level (omegaB97X-D/def2-SVP) for each conformation. Moreover, nabla^2DFT is the first dataset that contains relaxation trajectories for a substantial number of drug-like molecules. We also introduce a novel benchmark for evaluating NNPs in molecular property prediction, Hamiltonian prediction, and conformational optimization tasks. Finally, we propose an extendable framework for training NNPs and implement 10 models within it.

Summary

AI-Generated Summary

PDF1024December 2, 2024