ChatPaper.aiChatPaper

MDAgent2:面向分子动力学代码生成与知识问答的大语言模型

MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics

January 5, 2026
作者: Zhuofan Shi, Hubao A, Yufei Shao, Mengyan Dai, Yadong Yu, Pan Xiang, Dongliang Huang, Hongxu An, Chunxiao Xin, Haiyang Shen, Zhenyu Wang, Yunshan Na, Gang Huang, Xiang Jing
cs.AI

摘要

分子动力学模拟在材料科学原子尺度行为研究中具有关键作用,但LAMMPS脚本编写仍属高度专业化且耗时的工作。尽管大语言模型在代码生成和领域问答中展现出潜力,但其在分子动力学场景中的应用受限于领域数据稀缺、尖端大模型部署成本高昂以及代码可执行率低等问题。基于我们此前开发的MDAgent,本文提出首个实现分子动力学领域知识问答与代码生成端到端能力的框架MDAgent2。我们构建了领域专用的数据生成流程,产出涵盖分子动力学知识、问答及代码生成的三类高质量数据集。基于这些数据集,采用持续预训练、监督微调与强化学习三阶段策略,训练出MD-Instruct和MD-Code两个领域适配模型。进一步提出MD-GRPO强化学习方法,通过模拟结果作为奖励信号并循环利用低奖励轨迹实现持续优化。同时开发了可部署的多智能体系统MDAgent2-RUNTIME,集成代码生成、执行、评估与自我修正功能。结合本文首次提出的LAMMPS代码生成与问答基准MD-EvalBench,我们的模型与系统在多项指标上超越多个强基线模型。本工作系统论证了大语言模型在工业仿真任务中的适应性与泛化能力,为AI for Science及工业级仿真的自动代码生成奠定了方法论基础。项目地址:https://github.com/FredericVAN/PKU_MDAgent2
English
Molecular dynamics (MD) simulations are essential for understanding atomic-scale behaviors in materials science, yet writing LAMMPS scripts remains highly specialized and time-consuming tasks. Although LLMs show promise in code generation and domain-specific question answering, their performance in MD scenarios is limited by scarce domain data, the high deployment cost of state-of-the-art LLMs, and low code executability. Building upon our prior MDAgent, we present MDAgent2, the first end-to-end framework capable of performing both knowledge Q&A and code generation within the MD domain. We construct a domain-specific data-construction pipeline that yields three high-quality datasets spanning MD knowledge, question answering, and code generation. Based on these datasets, we adopt a three stage post-training strategy--continued pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL)--to train two domain-adapted models, MD-Instruct and MD-Code. Furthermore, we introduce MD-GRPO, a closed-loop RL method that leverages simulation outcomes as reward signals and recycles low-reward trajectories for continual refinement. We further build MDAgent2-RUNTIME, a deployable multi-agent system that integrates code generation, execution, evaluation, and self-correction. Together with MD-EvalBench proposed in this work, the first benchmark for LAMMPS code generation and question answering, our models and system achieve performance surpassing several strong baselines.This work systematically demonstrates the adaptability and generalization capability of large language models in industrial simulation tasks, laying a methodological foundation for automatic code generation in AI for Science and industrial-scale simulations. URL: https://github.com/FredericVAN/PKU_MDAgent2
PDF61January 9, 2026