EvoScientist: 종단간 과학적 발견을 위한 다중 에이전트 진화형 AI 과학자 플랫폼

초록

대규모 언어 모델(LLM)의 확산으로 AI 과학자들은 아이디어 생성과 실험 실행 등 전문적 역할 조정이 필요한 복잡한 엔드투엔드 과학 발견 과업을 수행할 수 있게 되었습니다. 그러나 대부분의 최첨단 AI 과학자 시스템은 정적이며 수동으로 설계된 파이프라인에 의존하여 축적된 상호작용 이력을 기반으로 적응하지 못합니다. 그 결과, 이러한 시스템은 유망한 연구 방향을 간과하고, 실패한 실험을 반복하며, 실행 불가능한 아이디어를 추구합니다. 이를 해결하기 위해 본 연구에서는 지속적 메모리와 자기 진화를 통해 연구 전략을 지속적으로 개선하는 진화형 다중 에이전트 AI 과학자 프레임워크인 EvoScientist를 소개합니다. EvoScientist는 과학적 아이디어 생성을 위한 연구원 에이전트(RA), 실험 구현 및 실행을 위한 엔지니어 에이전트(EA), 그리고 이전 상호작용에서 얻은 통찰력을 재사용 가능한 지식으로 정제하는 진화 관리자 에이전트(EMA)로 구성된 세 가지 전문 에이전트를 포함합니다. EvoScientist는 두 가지 지속적 메모리 모듈을 갖추고 있습니다: (i) 상위 순위 아이디어에서 실현 가능한 연구 방향을 요약하고 이전에 실패한 방향을 기록하는 아이디어 생성 메모리, (ii) 코드 탐색 궤적과 최고 성능 구현에서 도출된 효과적인 데이터 처리 및 모델 학습 전략을 포착하는 실험 메모리입니다. 이러한 모듈은 RA와 EA가 관련 이전 전략을 검색하여 시간이 지남에 따라 아이디어 품질과 코드 실행 성공률을 향상시킬 수 있도록 합니다. 실험 결과, EvoScientist는 과학적 아이디어 생성에서 7개의 오픈소스 및 상용 최첨단 시스템을 능가하며, 자동 및 인간 평가를 통해 더 높은 참신성, 실현 가능성, 관련성 및 명확성을 달성했습니다. EvoScientist는 또한 다중 에이전트 진화를 통해 코드 실행 성공률을 크게 개선하여 엔드투엔드 과학 발견을 위한 지속적 메모리의 효과를 입증했습니다.

English

The increasing adoption of Large Language Models (LLMs) has enabled AI scientists to perform complex end-to-end scientific discovery tasks requiring coordination of specialized roles, including idea generation and experimental execution. However, most state-of-the-art AI scientist systems rely on static, hand-designed pipelines and fail to adapt based on accumulated interaction histories. As a result, these systems overlook promising research directions, repeat failed experiments, and pursue infeasible ideas. To address this, we introduce EvoScientist, an evolving multi-agent AI scientist framework that continuously improves research strategies through persistent memory and self-evolution. EvoScientist comprises three specialized agents: a Researcher Agent (RA) for scientific idea generation, an Engineer Agent (EA) for experiment implementation and execution, and an Evolution Manager Agent (EMA) that distills insights from prior interactions into reusable knowledge. EvoScientist contains two persistent memory modules: (i) an ideation memory, which summarizes feasible research directions from top-ranked ideas while recording previously unsuccessful directions; and (ii) an experimentation memory, which captures effective data processing and model training strategies derived from code search trajectories and best-performing implementations. These modules enable the RA and EA to retrieve relevant prior strategies, improving idea quality and code execution success rates over time. Experiments show that EvoScientist outperforms 7 open-source and commercial state-of-the-art systems in scientific idea generation, achieving higher novelty, feasibility, relevance, and clarity via automatic and human evaluation. EvoScientist also substantially improves code execution success rates through multi-agent evolution, demonstrating persistent memory's effectiveness for end-to-end scientific discovery.

EvoScientist: 종단간 과학적 발견을 위한 다중 에이전트 진화형 AI 과학자 플랫폼

EvoScientist: Towards Multi-Agent Evolving AI Scientists for End-to-End Scientific Discovery

초록

Support