进化科学家：面向端到端科学发现的多智能体进化人工智能科学家

摘要

随着大语言模型（LLM）的广泛应用，人工智能科学家现已能够执行需要协调专业角色的复杂端到端科研发现任务，包括创意生成与实验执行。然而，当前多数顶尖AI科学家系统仍采用静态人工设计的流程，无法基于累积的交互历史进行自适应调整，导致其可能忽略有潜力的研究方向、重复失败实验或执着于不可行的构想。为此，我们提出EvoScientist——一个具备持续进化能力的多智能体AI科学家框架，通过持久化记忆与自我演进机制持续优化研究策略。该框架包含三个专业智能体：负责科学构想生成的研究员智能体（RA）、负责实验代码实现与执行的工程师智能体（EA），以及从历史交互中提炼可复用知识的进化管理智能体（EMA）。EvoScientist配备两大持久化记忆模块：（1）构想记忆库，通过记录高评分构想总结可行研究方向，同时标记失败路径；（2）实验记忆库，基于代码搜索轨迹与最优实现方案提炼有效的数据处理与模型训练策略。这些模块使RA和EA能检索历史策略，逐步提升构想质量与代码执行成功率。实验表明，在科学构想生成任务中，EvoScientist在自动评估与人工评估维度均超越7种开源与商业顶尖系统，展现出更高新颖性、可行性、相关性与清晰度。通过多智能体协同进化机制，该框架还显著提升了代码执行成功率，验证了持久化记忆对端到端科研发现的有效性。

English

The increasing adoption of Large Language Models (LLMs) has enabled AI scientists to perform complex end-to-end scientific discovery tasks requiring coordination of specialized roles, including idea generation and experimental execution. However, most state-of-the-art AI scientist systems rely on static, hand-designed pipelines and fail to adapt based on accumulated interaction histories. As a result, these systems overlook promising research directions, repeat failed experiments, and pursue infeasible ideas. To address this, we introduce EvoScientist, an evolving multi-agent AI scientist framework that continuously improves research strategies through persistent memory and self-evolution. EvoScientist comprises three specialized agents: a Researcher Agent (RA) for scientific idea generation, an Engineer Agent (EA) for experiment implementation and execution, and an Evolution Manager Agent (EMA) that distills insights from prior interactions into reusable knowledge. EvoScientist contains two persistent memory modules: (i) an ideation memory, which summarizes feasible research directions from top-ranked ideas while recording previously unsuccessful directions; and (ii) an experimentation memory, which captures effective data processing and model training strategies derived from code search trajectories and best-performing implementations. These modules enable the RA and EA to retrieve relevant prior strategies, improving idea quality and code execution success rates over time. Experiments show that EvoScientist outperforms 7 open-source and commercial state-of-the-art systems in scientific idea generation, achieving higher novelty, feasibility, relevance, and clarity via automatic and human evaluation. EvoScientist also substantially improves code execution success rates through multi-agent evolution, demonstrating persistent memory's effectiveness for end-to-end scientific discovery.