AIDE: コード空間におけるAI駆動型探索

要旨

現代人工知能の基盤である機械学習は、世界を根本的に変革するイノベーションを推進してきました。しかし、その進歩の裏側には、労力と計算資源を要する複雑でしばしば退屈な反復と実験のプロセスが存在します。機械学習モデルを開発するエンジニアや科学者は、革新的なソリューションや研究仮説を構想する代わりに、試行錯誤のタスクに多くの時間を費やしています。この課題に対処するため、我々は大規模言語モデル（LLM）を活用した機械学習エンジニアリングエージェントであるAI-Driven Exploration（AIDE）を提案します。AIDEは機械学習エンジニアリングをコード最適化問題として捉え、試行錯誤を潜在的な解の空間における木探索として定式化します。有望な解を戦略的に再利用し、洗練させることで、AIDEは計算資源を性能向上と効果的に交換し、Kaggle評価、OpenAI MLE-Bench、METRs RE-Benchを含む複数の機械学習エンジニアリングベンチマークで最先端の結果を達成しています。

English

Machine learning, the foundation of modern artificial intelligence, has driven innovations that have fundamentally transformed the world. Yet, behind advancements lies a complex and often tedious process requiring labor and compute intensive iteration and experimentation. Engineers and scientists developing machine learning models spend much of their time on trial-and-error tasks instead of conceptualizing innovative solutions or research hypotheses. To address this challenge, we introduce AI-Driven Exploration (AIDE), a machine learning engineering agent powered by large language models (LLMs). AIDE frames machine learning engineering as a code optimization problem, and formulates trial-and-error as a tree search in the space of potential solutions. By strategically reusing and refining promising solutions, AIDE effectively trades computational resources for enhanced performance, achieving state-of-the-art results on multiple machine learning engineering benchmarks, including our Kaggle evaluations, OpenAI MLE-Bench and METRs RE-Bench.

AIDE: コード空間におけるAI駆動型探索

AIDE: AI-Driven Exploration in the Space of Code

要旨

Support