EvoScientist：エンドツーエンドの科学的発見を実現するためのマルチエージェント進化型AI科学者に向けて

要旨

大規模言語モデル（LLM）の普及が進むにつれ、AI科学者は、アイデア生成と実験実行といった専門的な役割の連携を必要とする複雑なエンドツーエンドの科学的発見タスクを遂行できるようになってきた。しかし、現在の最先端AI科学者システムの多くは、静的な人手設計のパイプラインに依存しており、蓄積された対話履歴に基づいた適応ができていない。その結果、これらのシステムは有望な研究方向を見落としたり、失敗した実験を繰り返したり、実現不可能なアイデアを追求したりする。この問題に対処するため、我々は永続的メモリと自己進化を通じて研究戦略を継続的に改善する、進化型マルチエージェントAI科学者フレームワーク「EvoScientist」を提案する。EvoScientistは3つの専門エージェントで構成される：科学的アイデア生成を行う研究者エージェント（RA）、実験の実装と実行を行う技術者エージェント（EA）、そして過去の対話から得られた知見を再利用可能な知識として抽出する進化管理エージェント（EMA）である。EvoScientistは2つの永続的メモリモジュールを備える：（i）上位ランクのアイデアから実現可能な研究方向を要約し、過去に失敗した方向を記録する「構想メモリ」、（ii）コード検索の軌跡と最高性能の実装から導出された効果的なデータ処理およびモデル学習戦略を捕捉する「実験メモリ」である。これらのモジュールにより、RAとEAは関連する過去の戦略を検索でき、時間の経過とともにアイデアの品質とコード実行の成功率を向上させる。実験により、EvoScientistは科学的アイデア生成において7つのオープンソースおよび商用の最先端システムを凌駕し、自動評価および人間評価において、新規性、実現可能性、関連性、明確さの点でより高いスコアを達成した。また、EvoScientistはマルチエージェント進化を通じてコード実行の成功率を大幅に改善し、エンドツーエンドの科学的発見における永続的メモリの有効性を実証した。

English

The increasing adoption of Large Language Models (LLMs) has enabled AI scientists to perform complex end-to-end scientific discovery tasks requiring coordination of specialized roles, including idea generation and experimental execution. However, most state-of-the-art AI scientist systems rely on static, hand-designed pipelines and fail to adapt based on accumulated interaction histories. As a result, these systems overlook promising research directions, repeat failed experiments, and pursue infeasible ideas. To address this, we introduce EvoScientist, an evolving multi-agent AI scientist framework that continuously improves research strategies through persistent memory and self-evolution. EvoScientist comprises three specialized agents: a Researcher Agent (RA) for scientific idea generation, an Engineer Agent (EA) for experiment implementation and execution, and an Evolution Manager Agent (EMA) that distills insights from prior interactions into reusable knowledge. EvoScientist contains two persistent memory modules: (i) an ideation memory, which summarizes feasible research directions from top-ranked ideas while recording previously unsuccessful directions; and (ii) an experimentation memory, which captures effective data processing and model training strategies derived from code search trajectories and best-performing implementations. These modules enable the RA and EA to retrieve relevant prior strategies, improving idea quality and code execution success rates over time. Experiments show that EvoScientist outperforms 7 open-source and commercial state-of-the-art systems in scientific idea generation, achieving higher novelty, feasibility, relevance, and clarity via automatic and human evaluation. EvoScientist also substantially improves code execution success rates through multi-agent evolution, demonstrating persistent memory's effectiveness for end-to-end scientific discovery.

EvoScientist：エンドツーエンドの科学的発見を実現するためのマルチエージェント進化型AI科学者に向けて

EvoScientist: Towards Multi-Agent Evolving AI Scientists for End-to-End Scientific Discovery

要旨

Support