MLEvolve：一個自我演進的自動化機器學習演算法發現框架

摘要

大型語言模型（LLM）代理逐漸應用於諸如科學發現與機器學習工程（MLE）等長期任務，其中持續性的自我演化成為一項關鍵能力。然而，現有的MLE代理存在分支間資訊隔離、無記憶搜尋以及缺乏層級控制等問題，這些缺陷共同限制了長期最佳化的成效。我們提出MLEvolve——一個基於大型語言模型、自我演化的多代理框架，專為端到端機器學習演算法發現而設計。透過將樹狀搜尋擴展為漸進式MCGS，MLEvolve基於圖結構的參考邊實現跨分支資訊流動，並藉由熵啟發的漸進式排程，使搜尋逐步從廣泛探索轉向聚焦利用。為使代理能隨著累積經驗進行演化，我們引入回溯記憶機制，該機制結合冷啟動領域知識庫與動態全局記憶，用於任務特定經驗的檢索與重複使用。為實現穩定的長期迭代，我們進一步將策略規劃與程式碼生成解耦，並採用自適應編碼模式。在MLE-Bench上的評估顯示，MLEvolve在多個面向（包括在12小時預算、即標準運行時間一半的條件下的平均獎牌率與有效提交率）均達到最先進效能。此外，MLEvolve在數學演算法最佳化任務上亦優於包括AlphaEvolve在內的專業演算法發現方法，展現出強大的跨領域泛化能力。我們的程式碼已公開於 https://github.com/InternScience/MLEvolve。

English

Large language model (LLM) agents are increasingly applied to long-horizon tasks such as scientific discovery and machine learning engineering (MLE), where sustained self-evolution becomes a key capability. However, existing MLE agents suffer from inter-branch information isolation, memoryless search, and lack of hierarchical control, which together hinder long-horizon optimization. We present MLEvolve, an LLM-based self-evolving multi-agent framework for end-to-end machine learning algorithm discovery. By extending tree search to Progressive MCGS, MLEvolve enables cross-branch information flow through graph-based reference edges and gradually shifts the search from broad exploration to focused exploitation with an entropy-inspired progressive schedule. To allow the agent to evolve with accumulated experience, we introduce Retrospective Memory, which combines a cold-start domain knowledge base with a dynamic global memory for task-specific experience retrieval and reuse. For stable long-horizon iteration, we further decouple strategic planning from code generation with adaptive coding modes. Evaluation on MLE-Bench shows that MLEvolve achieves state-of-the-art performance across multiple dimensions including average medal rate and valid submission rate under a 12-hour budget (half the standard runtime). Moreover, MLEvolve also outperforms specialized algorithm discovery methods including AlphaEvolve on mathematical algorithm optimization tasks, demonstrating strong cross-domain generalization. Our code is available at https://github.com/InternScience/MLEvolve.