EvoMaster: 대규모 진화하는 자율 과학 에이전트 구축을 위한 기초 에이전트 프레임워크

초록

대규모 언어 모델과 에이전트의 융합은 과학적 발견의 새로운 시대, 즉 에이전트 기반 과학(Agentic Science)을 촉진하고 있습니다. 과학적 방법론은 본질적으로 반복적이지만, 기존 에이전트 프레임워크는 주로 정적이며 범위가 제한되고 시행착오로부터 학습할 수 있는 능력이 부족합니다. 이러한 격차를 해소하기 위해 우리는 대규모 에이전트 기반 과학을 위해 특별히 설계된 진화적 기초 에이전트 프레임워크인 EvoMaster를 제시합니다. 지속적 자가 진화라는 핵심 원리에 기반한 EvoMaster는 에이전트가 실험 주기를 거쳐 가설을 반복적으로 개선하고, 자기 비판을 수행하며, 점진적으로 지식을 축적할 수 있도록 하여 인간의 과학적 탐구 과정을 충실히 재현합니다. 중요한 것은, EvoMaster는 도메인에 구애받지 않는 기반 플랫폼으로서 확장성이 매우 뛰어나며, 개발자가 약 100줄의 코드로 임의의 분야에 대해 고성능의 자가 진화 과학 에이전트를 구축하고 배포할 수 있게 합니다. EvoMaster를 기반으로 우리는 기계 학습, 물리학, 일반 과학 등 다양한 분야에 걸쳐 SciMaster 생태계를 구축했습니다. 4개의 권위 있는 벤치마크(Humanity's Last Exam, MLE-Bench Lite, BrowseComp, FrontierScience)에서의 평가 결과, EvoMaster는 각각 41.1%, 75.8%, 73.3%, 53.3%라는 최첨단 성적을 달성했습니다. 이는 범용 기준선인 OpenClaw를 +159%에서 +316%에 이르는 상대적 개선률로 종합적으로 능가하며, 차세대 자율 과학 발견을 위한 최고의 기초 프레임워크로서의 효율성과 일반성을 강력하게 입증합니다. EvoMaster는 https://github.com/sjtu-sai-agents/EvoMaster에서 이용할 수 있습니다.

English

The convergence of large language models and agents is catalyzing a new era of scientific discovery: Agentic Science. While the scientific method is inherently iterative, existing agent frameworks are predominantly static, narrowly scoped, and lack the capacity to learn from trial and error. To bridge this gap, we present EvoMaster, a foundational evolving agent framework engineered specifically for Agentic Science at Scale. Driven by the core principle of continuous self-evolution, EvoMaster empowers agents to iteratively refine hypotheses, self-critique, and progressively accumulate knowledge across experimental cycles, faithfully mirroring human scientific inquiry. Crucially, as a domain-agnostic base harness, EvoMaster is exceptionally easy to scale up -- enabling developers to build and deploy highly capable, self-evolving scientific agents for arbitrary disciplines in approximately 100 lines of code. Built upon EvoMaster, we incubated the SciMaster ecosystem across domains such as machine learning, physics, and general science. Evaluations on four authoritative benchmarks (Humanity's Last Exam, MLE-Bench Lite, BrowseComp, and FrontierScience) demonstrate that EvoMaster achieves state-of-the-art scores of 41.1%, 75.8%, 73.3%, and 53.3%, respectively. It comprehensively outperforms the general-purpose baseline OpenClaw with relative improvements ranging from +159% to +316%, robustly validating its efficacy and generality as the premier foundational framework for the next generation of autonomous scientific discovery. EvoMaster is available at https://github.com/sjtu-sai-agents/EvoMaster.

EvoMaster: 대규모 진화하는 자율 과학 에이전트 구축을 위한 기초 에이전트 프레임워크

EvoMaster: A Foundational Agent Framework for Building Evolving Autonomous Scientific Agents at Scale

초록

Support