MDAgent2:用於分子動力學代碼生成與知識問答的大型語言模型
MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics
January 5, 2026
作者: Zhuofan Shi, Hubao A, Yufei Shao, Mengyan Dai, Yadong Yu, Pan Xiang, Dongliang Huang, Hongxu An, Chunxiao Xin, Haiyang Shen, Zhenyu Wang, Yunshan Na, Gang Huang, Xiang Jing
cs.AI
摘要
分子動力學模擬在材料科學的原子尺度行為研究中具有關鍵作用,但撰寫LAMMPS腳本仍是高度專業化且耗時的任務。儘管大語言模型在代碼生成和領域問答方面展現潛力,其在分子動力學場景的應用仍受限於領域數據稀缺、尖端大語言模型部署成本高昂以及代碼可執行率低等問題。基於我們先前提出的MDAgent框架,我們推出首個能夠在分子動力學領域實現知識問答與代碼生成端到端執行的MDAgent2系統。通過構建領域專屬的數據生成流水線,我們製備了涵蓋分子動力學知識、問答與代碼生成的三類高質量數據集。基於這些數據集,我們採用三階段訓練策略——繼續預訓練、監督微調和強化學習——成功訓練出MD-Instruct與MD-Code兩個領域適配模型。此外,我們提出MD-GRPO強化學習方法,通過將模擬結果轉化為獎勵信號,並回收低獎勵軌跡實現持續優化。我們進一步構建了可部署的多智能體系統MDAgent2-RUNTIME,集成代碼生成、執行、評估與自我修正功能。結合本文首次提出的LAMMPS代碼生成與問答基準MD-EvalBench,我們的模型與系統在多項指標上超越若干強基線模型。本研究系統性驗證了大語言模型在工業模擬任務中的適應性與泛化能力,為AI for Science領域的自動代碼生成及工業級模擬應用奠定了方法學基礎。項目網址:https://github.com/FredericVAN/PKU_MDAgent2
English
Molecular dynamics (MD) simulations are essential for understanding atomic-scale behaviors in materials science, yet writing LAMMPS scripts remains highly specialized and time-consuming tasks. Although LLMs show promise in code generation and domain-specific question answering, their performance in MD scenarios is limited by scarce domain data, the high deployment cost of state-of-the-art LLMs, and low code executability. Building upon our prior MDAgent, we present MDAgent2, the first end-to-end framework capable of performing both knowledge Q&A and code generation within the MD domain. We construct a domain-specific data-construction pipeline that yields three high-quality datasets spanning MD knowledge, question answering, and code generation. Based on these datasets, we adopt a three stage post-training strategy--continued pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL)--to train two domain-adapted models, MD-Instruct and MD-Code. Furthermore, we introduce MD-GRPO, a closed-loop RL method that leverages simulation outcomes as reward signals and recycles low-reward trajectories for continual refinement. We further build MDAgent2-RUNTIME, a deployable multi-agent system that integrates code generation, execution, evaluation, and self-correction. Together with MD-EvalBench proposed in this work, the first benchmark for LAMMPS code generation and question answering, our models and system achieve performance surpassing several strong baselines.This work systematically demonstrates the adaptability and generalization capability of large language models in industrial simulation tasks, laying a methodological foundation for automatic code generation in AI for Science and industrial-scale simulations. URL: https://github.com/FredericVAN/PKU_MDAgent2