进化科学家：迈向实现端到端科学发现的多智能体进化AI科学家

摘要

随着大语言模型（LLM）的广泛应用，人工智能科学家现已能够执行需要协调专业角色的复杂端到端科研发现任务，包括创意生成与实验执行。然而，当前最先进的人工智能科学家系统大多依赖静态、人工设计的流程，无法根据累积的交互历史进行自适应调整。这导致这些系统可能忽略有潜力的研究方向、重复失败实验、或执着于不可行的构想。为此，我们提出EvoScientist——一个具备持续进化能力的多智能体科研框架，通过持久化记忆与自我进化机制不断优化研究策略。该框架包含三个专业智能体：负责科学构想生成的研究员智能体（RA）、负责实验代码实现与执行的工程师智能体（EA），以及从历史交互中提炼可复用知识的进化管理智能体（EMA）。系统配备两大持久化记忆模块：（1）构想记忆库，通过记录高评分创意总结可行研究方向，同时标记失败路径；（2）实验记忆库，基于代码搜索轨迹与最优实施方案提炼高效数据处理与模型训练策略。这些模块使RA和EA能检索相关历史策略，持续提升构想质量与代码执行成功率。实验表明，在科学创意生成任务中，EvoScientist在自动与人工评估维度上均超越7种开源及商业顶尖系统，在新颖性、可行性、相关性与清晰度方面表现更优。通过多智能体协同进化机制，该框架还显著提高了代码执行成功率，验证了持久化记忆对端到端科研发现的有效性。

English

The increasing adoption of Large Language Models (LLMs) has enabled AI scientists to perform complex end-to-end scientific discovery tasks requiring coordination of specialized roles, including idea generation and experimental execution. However, most state-of-the-art AI scientist systems rely on static, hand-designed pipelines and fail to adapt based on accumulated interaction histories. As a result, these systems overlook promising research directions, repeat failed experiments, and pursue infeasible ideas. To address this, we introduce EvoScientist, an evolving multi-agent AI scientist framework that continuously improves research strategies through persistent memory and self-evolution. EvoScientist comprises three specialized agents: a Researcher Agent (RA) for scientific idea generation, an Engineer Agent (EA) for experiment implementation and execution, and an Evolution Manager Agent (EMA) that distills insights from prior interactions into reusable knowledge. EvoScientist contains two persistent memory modules: (i) an ideation memory, which summarizes feasible research directions from top-ranked ideas while recording previously unsuccessful directions; and (ii) an experimentation memory, which captures effective data processing and model training strategies derived from code search trajectories and best-performing implementations. These modules enable the RA and EA to retrieve relevant prior strategies, improving idea quality and code execution success rates over time. Experiments show that EvoScientist outperforms 7 open-source and commercial state-of-the-art systems in scientific idea generation, achieving higher novelty, feasibility, relevance, and clarity via automatic and human evaluation. EvoScientist also substantially improves code execution success rates through multi-agent evolution, demonstrating persistent memory's effectiveness for end-to-end scientific discovery.

进化科学家：迈向实现端到端科学发现的多智能体进化AI科学家

EvoScientist: Towards Multi-Agent Evolving AI Scientists for End-to-End Scientific Discovery

摘要

Support