EvoMaster:面向规模化演进型自主科研智能体的基础智能体框架
EvoMaster: A Foundational Agent Framework for Building Evolving Autonomous Scientific Agents at Scale
April 19, 2026
作者: Xinyu Zhu, Yuzhu Cai, Zexi Liu, Cheng Wang, Fengyang Li, Wenkai Jin, Wanxu Liu, Zehao Bing, Bingyang Zheng, Jingyi Chai, Shuo Tang, Rui Ye, Yuwen Du, Xianghe Pang, Yaxin Du, Tingjia Miao, Yuzhi Zhang, Ruoxue Liao, Zhaohan Ding, Linfeng Zhang, Yanfeng Wang, Weinan E, Siheng Chen
cs.AI
摘要
大语言模型与智能体的融合正催生科学发现的新纪元:能动科学。尽管科学方法本质上是迭代的,但现有智能体框架多为静态设计、范围狭窄,且缺乏从试错中学习的能力。为弥补这一差距,我们推出EvoMaster——一个专为规模化能动科学打造的基础性演化智能体框架。以持续自我进化为核心原则,EvoMaster使智能体能够在实验周期中迭代优化假说、开展自我批判并逐步积累知识,真实复现人类科研探索过程。作为领域无关的基础平台,EvoMaster具备卓越的可扩展性——开发者仅需约100行代码即可为任意学科构建并部署高性能的自进化科学智能体。基于该框架,我们孵化了覆盖机器学习、物理学及通用科学等领域的SciMaster生态系统。在四大权威基准测试(Humanity's Last Exam、MLE-Bench Lite、BrowseComp和FrontierScience)中,EvoMaster分别取得41.1%、75.8%、73.3%和53.3%的顶尖成绩,相较通用基线OpenClaw实现+159%至+316%的相对性能提升,有力验证了其作为新一代自主科学发现基础框架的有效性与普适性。EvoMaster项目地址:https://github.com/sjtu-sai-agents/EvoMaster。
English
The convergence of large language models and agents is catalyzing a new era of scientific discovery: Agentic Science. While the scientific method is inherently iterative, existing agent frameworks are predominantly static, narrowly scoped, and lack the capacity to learn from trial and error. To bridge this gap, we present EvoMaster, a foundational evolving agent framework engineered specifically for Agentic Science at Scale. Driven by the core principle of continuous self-evolution, EvoMaster empowers agents to iteratively refine hypotheses, self-critique, and progressively accumulate knowledge across experimental cycles, faithfully mirroring human scientific inquiry. Crucially, as a domain-agnostic base harness, EvoMaster is exceptionally easy to scale up -- enabling developers to build and deploy highly capable, self-evolving scientific agents for arbitrary disciplines in approximately 100 lines of code. Built upon EvoMaster, we incubated the SciMaster ecosystem across domains such as machine learning, physics, and general science. Evaluations on four authoritative benchmarks (Humanity's Last Exam, MLE-Bench Lite, BrowseComp, and FrontierScience) demonstrate that EvoMaster achieves state-of-the-art scores of 41.1%, 75.8%, 73.3%, and 53.3%, respectively. It comprehensively outperforms the general-purpose baseline OpenClaw with relative improvements ranging from +159% to +316%, robustly validating its efficacy and generality as the premier foundational framework for the next generation of autonomous scientific discovery. EvoMaster is available at https://github.com/sjtu-sai-agents/EvoMaster.