AIDE：代碼空間中的AI驅動探索

摘要

機器學習，作為現代人工智慧的基石，已推動了從根本上改變世界的創新。然而，在這些進步的背後，隱藏著一個複雜且往往繁瑣的過程，需要耗費大量人力和計算資源進行迭代與實驗。開發機器學習模型的工程師和科學家們，將大量時間花費在試錯任務上，而非構思創新解決方案或研究假設。為應對這一挑戰，我們引入了AI驅動探索（AIDE），這是一個由大型語言模型（LLMs）驅動的機器學習工程代理。AIDE將機器學習工程視為代碼優化問題，並將試錯過程形式化為在潛在解決方案空間中的樹搜索。通過策略性地重用和改進有前景的解決方案，AIDE有效地以計算資源換取性能提升，在多個機器學習工程基準測試中取得了最先進的成果，包括我們的Kaggle評估、OpenAI MLE-Bench和METRs RE-Bench。

English

Machine learning, the foundation of modern artificial intelligence, has driven innovations that have fundamentally transformed the world. Yet, behind advancements lies a complex and often tedious process requiring labor and compute intensive iteration and experimentation. Engineers and scientists developing machine learning models spend much of their time on trial-and-error tasks instead of conceptualizing innovative solutions or research hypotheses. To address this challenge, we introduce AI-Driven Exploration (AIDE), a machine learning engineering agent powered by large language models (LLMs). AIDE frames machine learning engineering as a code optimization problem, and formulates trial-and-error as a tree search in the space of potential solutions. By strategically reusing and refining promising solutions, AIDE effectively trades computational resources for enhanced performance, achieving state-of-the-art results on multiple machine learning engineering benchmarks, including our Kaggle evaluations, OpenAI MLE-Bench and METRs RE-Bench.

AIDE：代碼空間中的AI驅動探索

AIDE: AI-Driven Exploration in the Space of Code

摘要

Support