开放材料2024(OMat24)无机材料数据集与模型
Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models
October 16, 2024
作者: Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi
cs.AI
摘要
发现具有理想性能的新材料的能力对于从帮助减缓气候变化到推动下一代计算硬件的应用至关重要。人工智能有潜力通过比其他计算方法或试错更有效地探索化学空间来加速材料的发现和设计。虽然在材料数据、基准测试和模型的人工智能方面取得了实质性进展,但出现的一个障碍是缺乏公开可用的训练数据和开放的预训练模型。为了解决这个问题,我们提出了一个Meta FAIR发布的Open Materials 2024(OMat24)大规模开放数据集以及一组配套的预训练模型。OMat24包含超过1.1亿个密度泛函理论(DFT)计算,重点关注结构和成分多样性。我们的EquiformerV2模型在Matbench Discovery排行榜上实现了最先进的性能,并能够预测基态稳定性和形成能量,F1分数超过0.9,准确率分别为20毫电子伏特/原子。我们探讨了模型大小、辅助去噪目标和微调对性能的影响,涵盖了一系列数据集,包括OMat24、MPtraj和Alexandria。OMat24数据集和模型的开放发布使研究社区能够在我们的努力基础上继续努力,并推动人工智能辅助材料科学的进一步发展。
English
The ability to discover new materials with desirable properties is critical
for numerous applications from helping mitigate climate change to advances in
next generation computing hardware. AI has the potential to accelerate
materials discovery and design by more effectively exploring the chemical space
compared to other computational methods or by trial-and-error. While
substantial progress has been made on AI for materials data, benchmarks, and
models, a barrier that has emerged is the lack of publicly available training
data and open pre-trained models. To address this, we present a Meta FAIR
release of the Open Materials 2024 (OMat24) large-scale open dataset and an
accompanying set of pre-trained models. OMat24 contains over 110 million
density functional theory (DFT) calculations focused on structural and
compositional diversity. Our EquiformerV2 models achieve state-of-the-art
performance on the Matbench Discovery leaderboard and are capable of predicting
ground-state stability and formation energies to an F1 score above 0.9 and an
accuracy of 20 meV/atom, respectively. We explore the impact of model size,
auxiliary denoising objectives, and fine-tuning on performance across a range
of datasets including OMat24, MPtraj, and Alexandria. The open release of the
OMat24 dataset and models enables the research community to build upon our
efforts and drive further advancements in AI-assisted materials science.