MLEvolve: 自動機械学習アルゴリズム発見のための自己進化型フレームワーク

要旨

大規模言語モデル（LLM）エージェントは、科学的発見や機械学習エンジニアリング（MLE）といった長期的なタスクにますます応用されており、持続的な自己進化が重要な能力となっている。しかし、既存のMLEエージェントは、分岐間情報隔離、無記憶探索、階層的制御の欠如といった問題を抱えており、これらが長期にわたる最適化を妨げている。本稿では、エンドツーエンドの機械学習アルゴリズム発見のための、LLMベースの自己進化型マルチエージェントフレームワークであるMLEvolveを提案する。MLEvolveは、ツリー探索をProgressive MCGSに拡張することで、グラフベースの参照エッジを通じて分岐間の情報の流れを可能にし、エントロピーに着想を得た漸進的スケジュールにより、探索を広範な探索から焦点を絞った活用へと徐々に移行させる。また、蓄積された経験とともにエージェントが進化できるようにするため、Retrospective Memoryを導入する。これは、コールドスタートのドメイン知識ベースと、タスク固有の経験の検索と再利用のための動的グローバルメモリを組み合わせたものである。安定した長期反復のために、戦略的プランニングとコード生成を適応的コーディングモードでさらに分離する。MLE-Benchでの評価では、MLEvolveは12時間の予算（標準実行時間の半分）において、平均メダル率や有効提出率など、複数の次元で最先端の性能を達成した。さらに、MLEvolveは数学的アルゴリズム最適化タスクにおいてAlphaEvolveを含む専門的なアルゴリズム発見手法を上回り、強力なクロスドメイン汎化を示している。我々のコードはhttps://github.com/InternScience/MLEvolveで公開されている。

English

Large language model (LLM) agents are increasingly applied to long-horizon tasks such as scientific discovery and machine learning engineering (MLE), where sustained self-evolution becomes a key capability. However, existing MLE agents suffer from inter-branch information isolation, memoryless search, and lack of hierarchical control, which together hinder long-horizon optimization. We present MLEvolve, an LLM-based self-evolving multi-agent framework for end-to-end machine learning algorithm discovery. By extending tree search to Progressive MCGS, MLEvolve enables cross-branch information flow through graph-based reference edges and gradually shifts the search from broad exploration to focused exploitation with an entropy-inspired progressive schedule. To allow the agent to evolve with accumulated experience, we introduce Retrospective Memory, which combines a cold-start domain knowledge base with a dynamic global memory for task-specific experience retrieval and reuse. For stable long-horizon iteration, we further decouple strategic planning from code generation with adaptive coding modes. Evaluation on MLE-Bench shows that MLEvolve achieves state-of-the-art performance across multiple dimensions including average medal rate and valid submission rate under a 12-hour budget (half the standard runtime). Moreover, MLEvolve also outperforms specialized algorithm discovery methods including AlphaEvolve on mathematical algorithm optimization tasks, demonstrating strong cross-domain generalization. Our code is available at https://github.com/InternScience/MLEvolve.