深層研究を統合したAlphaEvolveによる科学的アルゴリズム発見

要旨

大規模言語モデルは科学アシスタントとしての可能性を秘めているが、既存のエージェントはアルゴリズム進化のみに依存するか、あるいは孤立した深い研究に頼るかのいずれかであり、いずれも重大な限界に直面している。AlphaEvolveのような純粋なアルゴリズム進化は、大規模言語モデルの内部知識のみに依存し、複雑な領域ではすぐに頭打ちになる。一方、純粋な深い研究は検証なしにアイデアを提案するため、非現実的または実現不可能な解決策を生み出す。本論文では、深い研究とアルゴリズム進化を統合したエージェントであるDeepEvolveを紹介する。DeepEvolveは、外部知識の検索、クロスファイルのコード編集、およびフィードバック駆動型の反復ループの下での体系的なデバッグを統合する。各反復では、新しい仮説を提案するだけでなく、それらを洗練し、実装し、テストすることで、浅い改善や非生産的な過剰な洗練を回避する。化学、数学、生物学、材料、特許の9つのベンチマークにおいて、DeepEvolveは一貫して初期アルゴリズムを改善し、持続的な利益をもたらす実行可能な新しいアルゴリズムを生成する。無指導の進化と根拠のない研究の間のギャップを埋めることで、DeepEvolveは科学的アルゴリズム発見を進めるための信頼性の高いフレームワークを提供する。私たちのコードはhttps://github.com/liugangcode/deepevolveで公開されている。

English

Large language models hold promise as scientific assistants, yet existing agents either rely solely on algorithm evolution or on deep research in isolation, both of which face critical limitations. Pure algorithm evolution, as in AlphaEvolve, depends only on the internal knowledge of LLMs and quickly plateaus in complex domains, while pure deep research proposes ideas without validation, resulting in unrealistic or unimplementable solutions. We present DeepEvolve, an agent that integrates deep research with algorithm evolution, uniting external knowledge retrieval, cross-file code editing, and systematic debugging under a feedback-driven iterative loop. Each iteration not only proposes new hypotheses but also refines, implements, and tests them, avoiding both shallow improvements and unproductive over-refinements. Across nine benchmarks in chemistry, mathematics, biology, materials, and patents, DeepEvolve consistently improves the initial algorithm, producing executable new algorithms with sustained gains. By bridging the gap between unguided evolution and research without grounding, DeepEvolve provides a reliable framework for advancing scientific algorithm discovery. Our code is available at https://github.com/liugangcode/deepevolve.