ChatPaper.aiChatPaper

EvoMaster:一种用于规模化构建进化型自主科学智能体的基础智能体框架

EvoMaster: A Foundational Agent Framework for Building Evolving Autonomous Scientific Agents at Scale

April 19, 2026
作者: Xinyu Zhu, Yuzhu Cai, Zexi Liu, Cheng Wang, Fengyang Li, Wenkai Jin, Wanxu Liu, Zehao Bing, Bingyang Zheng, Jingyi Chai, Shuo Tang, Rui Ye, Yuwen Du, Xianghe Pang, Yaxin Du, Tingjia Miao, Yuzhi Zhang, Ruoxue Liao, Zhaohan Ding, Linfeng Zhang, Yanfeng Wang, Weinan E, Siheng Chen
cs.AI

摘要

大语言模型与智能体的融合正在催生科学发现的新纪元:自主科学。尽管科学方法本质上是迭代的,但现有智能体框架多为静态设计、适用范围狭窄,且缺乏从试错中学习的能力。为弥补这一鸿沟,我们推出基础演化智能体框架EvoMaster,专为规模化自主科学而构建。该框架以持续自我进化为核心驱动力,使智能体能够跨实验周期迭代优化假设、开展自我批判并逐步积累知识,真实复现人类科研探索过程。作为领域无关的基础平台,EvoMaster具备卓越的可扩展性——开发者仅需约100行代码即可为任意学科构建并部署高性能的自演化科学智能体。基于EvoMaster,我们孵化了覆盖机器学习、物理学及通用科学等领域的SciMaster生态系统。在四大权威基准(Humanity's Last Exam、MLE-Bench Lite、BrowseComp和FrontierScience)上的评估表明,EvoMaster分别取得41.1%、75.8%、73.3%和53.3%的顶尖成绩,相较通用基线OpenClaw实现+159%至+316%的相对性能提升,有力验证其作为新一代自主科学发现基础框架的有效性与普适性。EvoMaster项目地址:https://github.com/sjtu-sai-agents/EvoMaster。
English
The convergence of large language models and agents is catalyzing a new era of scientific discovery: Agentic Science. While the scientific method is inherently iterative, existing agent frameworks are predominantly static, narrowly scoped, and lack the capacity to learn from trial and error. To bridge this gap, we present EvoMaster, a foundational evolving agent framework engineered specifically for Agentic Science at Scale. Driven by the core principle of continuous self-evolution, EvoMaster empowers agents to iteratively refine hypotheses, self-critique, and progressively accumulate knowledge across experimental cycles, faithfully mirroring human scientific inquiry. Crucially, as a domain-agnostic base harness, EvoMaster is exceptionally easy to scale up -- enabling developers to build and deploy highly capable, self-evolving scientific agents for arbitrary disciplines in approximately 100 lines of code. Built upon EvoMaster, we incubated the SciMaster ecosystem across domains such as machine learning, physics, and general science. Evaluations on four authoritative benchmarks (Humanity's Last Exam, MLE-Bench Lite, BrowseComp, and FrontierScience) demonstrate that EvoMaster achieves state-of-the-art scores of 41.1%, 75.8%, 73.3%, and 53.3%, respectively. It comprehensively outperforms the general-purpose baseline OpenClaw with relative improvements ranging from +159% to +316%, robustly validating its efficacy and generality as the premier foundational framework for the next generation of autonomous scientific discovery. EvoMaster is available at https://github.com/sjtu-sai-agents/EvoMaster.
PDF11April 22, 2026