MLEvolve: 자체 진화적 프레임워크를 통한 자동 머신러닝 알고리즘 발견

초록

대규모 언어 모델(LLM) 에이전트는 지속적인 자기 진화가 핵심 역량이 되는 과학적 발견 및 머신러닝 엔지니어링(MLE)과 같은 장기적 과제에 점점 더 많이 적용되고 있다. 그러나 기존 MLE 에이전트는 분기 간 정보 고립, 메모리 없는 탐색, 계층적 제어 부족 등의 문제를 겪고 있으며, 이는 장기 최적화를 저해한다. 본 논문에서는 엔드투엔드 머신러닝 알고리즘 발견을 위한 LLM 기반 자기 진화형 다중 에이전트 프레임워크인 MLEvolve를 제안한다. MLEvolve는 트리 탐색을 Progressive MCGS로 확장하여 그래프 기반 참조 에지를 통해 분기 간 정보 흐름을 가능하게 하고, 엔트로피에서 착안한 점진적 일정에 따라 탐색을 광범위한 탐험에서 집중적 활용으로 점진적으로 전환한다. 에이전트가 축적된 경험을 바탕으로 진화할 수 있도록, 콜드 스타트 도메인 지식 베이스와 동적 전역 메모리를 결합하여 작업별 경험 검색 및 재사용을 가능하게 하는 회고적 메모리(Retrospective Memory)를 도입한다. 안정적인 장기 반복을 위해 적응형 코딩 모드를 통해 전략적 계획을 코드 생성과 분리한다. MLE-Bench를 통한 평가 결과, MLEvolve는 12시간 예산(표준 실행 시간의 절반) 하에서 평균 메달 비율 및 유효 제출 비율 등 여러 차원에서 최첨단 성능을 달성함을 보인다. 또한 MLEvolve는 수학적 알고리즘 최적화 작업에서 AlphaEvolve를 포함한 특화 알고리즘 발견 방법보다 뛰어난 성능을 보이며 강력한 교차 도메인 일반화 능력을 입증한다. 코드는 https://github.com/InternScience/MLEvolve에서 확인할 수 있다.

English

Large language model (LLM) agents are increasingly applied to long-horizon tasks such as scientific discovery and machine learning engineering (MLE), where sustained self-evolution becomes a key capability. However, existing MLE agents suffer from inter-branch information isolation, memoryless search, and lack of hierarchical control, which together hinder long-horizon optimization. We present MLEvolve, an LLM-based self-evolving multi-agent framework for end-to-end machine learning algorithm discovery. By extending tree search to Progressive MCGS, MLEvolve enables cross-branch information flow through graph-based reference edges and gradually shifts the search from broad exploration to focused exploitation with an entropy-inspired progressive schedule. To allow the agent to evolve with accumulated experience, we introduce Retrospective Memory, which combines a cold-start domain knowledge base with a dynamic global memory for task-specific experience retrieval and reuse. For stable long-horizon iteration, we further decouple strategic planning from code generation with adaptive coding modes. Evaluation on MLE-Bench shows that MLEvolve achieves state-of-the-art performance across multiple dimensions including average medal rate and valid submission rate under a 12-hour budget (half the standard runtime). Moreover, MLEvolve also outperforms specialized algorithm discovery methods including AlphaEvolve on mathematical algorithm optimization tasks, demonstrating strong cross-domain generalization. Our code is available at https://github.com/InternScience/MLEvolve.