ChatPaper.aiChatPaper

MatTools:大型語言模型在材料科學工具中的基準測試

MatTools: Benchmarking Large Language Models for Materials Science Tools

May 16, 2025
作者: Siyu Liu, Jiamin Xu, Beilin Ye, Bo Hu, David J. Srolovitz, Tongqi Wen
cs.AI

摘要

大型語言模型(LLMs)在材料科學問題上的應用日益廣泛,包括文獻理解、性質預測、材料發現及合金設計。與此同時,基於物理的計算方法也得到廣泛發展,可用於計算材料特性。本文提出了一項基準應用,旨在評估LLMs通過生成並安全執行基於此類物理計算材料科學套件的代碼來回答材料科學問題的能力。MatTools建立在兩個互補組件之上:一個材料模擬工具問答(QA)基準和一個現實世界工具使用基準。我們設計了一種自動化方法,以高效收集現實世界材料科學工具使用範例。QA基準源自pymatgen(Python材料基因組)代碼庫及文檔,包含69,225個QA對,用於評估LLM理解材料科學工具的能力。現實世界基準包含49項任務(138個子任務),要求生成用於材料性質計算的功能性Python代碼。我們對多種LLMs的評估得出了三個關鍵見解:(1)通才勝於專才;(2)AI了解AI;(3)簡約為上。MatTools為評估和提升LLM在材料科學工具應用中的能力提供了一個標準化框架,促進了更有效AI系統在材料科學及一般科學研究中的發展。
English
Large language models (LLMs) are increasingly applied to materials science questions, including literature comprehension, property prediction, materials discovery and alloy design. At the same time, a wide range of physics-based computational approaches have been developed in which materials properties can be calculated. Here, we propose a benchmark application to evaluate the proficiency of LLMs to answer materials science questions through the generation and safe execution of codes based on such physics-based computational materials science packages. MatTools is built on two complementary components: a materials simulation tool question-answer (QA) benchmark and a real-world tool-usage benchmark. We designed an automated methodology to efficiently collect real-world materials science tool-use examples. The QA benchmark, derived from the pymatgen (Python Materials Genomics) codebase and documentation, comprises 69,225 QA pairs that assess the ability of an LLM to understand materials science tools. The real-world benchmark contains 49 tasks (138 subtasks) requiring the generation of functional Python code for materials property calculations. Our evaluation of diverse LLMs yields three key insights: (1)Generalists outshine specialists;(2)AI knows AI; and (3)Simpler is better. MatTools provides a standardized framework for assessing and improving LLM capabilities for materials science tool applications, facilitating the development of more effective AI systems for materials science and general scientific research.

Summary

AI-Generated Summary

PDF52May 19, 2025