Op Weg naar Ultra-Langetermijn Agent-gebaseerde Wetenschap: Cognitieve Accumulatie voor Machine Learning Engineering

Samenvatting

De voortgang van kunstmatige intelligentie in de richting van agent-gebaseerde wetenschap wordt momenteel belemmerd door de uitdaging van ultra-langetermijnautonomie: het vermogen om strategische samenhang en iteratieve correctie te handhaven over experimentele cycli die dagen of weken beslaan. Hoewel Large Language Models (LLM's) bekwaamheid hebben getoond in kortetermijnredenering, raken ze snel overweldigd door uitvoeringsdetails in hoogdimensionale onderzoeksomgevingen met vertraagde feedback, waardoor ze niet in staat zijn schaarse feedback te consolideren tot samenhangende langetermijnrichtlijnen. Hier presenteren we ML-Master 2.0, een autonome agent die ultra-langetermijn machine learning engineering (MLE) beheerst – een representatieve microkosmos van wetenschappelijke ontdekking. Door contextmanagement te herformuleren als een proces van cognitieve accumulatie, introduceert onze aanpak Hierarchical Cognitive Caching (HCC), een gelaagde architectuur geïnspireerd op computersystemen die structurele differentiatie van ervaring over tijd mogelijk maakt. Door vluchtige uitvoeringstrajecten dynamisch te destilleren tot stabiele kennis en grensoverschrijdende wijsheid, stelt HCC agents in staat onmiddellijke uitvoering te ontkoppelen van langetermijnexperimentele strategie, waardoor de schaalbeperkingen van statische contextvensters effectief worden overwonnen. In evaluaties op OpenAI's MLE-Bench met een 24-uursbudget behaalt ML-Master 2.0 een state-of-the-art medaillescore van 56,44%. Onze bevindingen tonen aan dat ultra-langetermijnautonomie een schaalbaar blauwdruk biedt voor AI die in staat is tot autonome verkenning voorbij complexiteiten met menselijke precedenten.

English

The advancement of artificial intelligence toward agentic science is currently bottlenecked by the challenge of ultra-long-horizon autonomy, the ability to sustain strategic coherence and iterative correction over experimental cycles spanning days or weeks. While Large Language Models (LLMs) have demonstrated prowess in short-horizon reasoning, they are easily overwhelmed by execution details in the high-dimensional, delayed-feedback environments of real-world research, failing to consolidate sparse feedback into coherent long-term guidance. Here, we present ML-Master 2.0, an autonomous agent that masters ultra-long-horizon machine learning engineering (MLE) which is a representative microcosm of scientific discovery. By reframing context management as a process of cognitive accumulation, our approach introduces Hierarchical Cognitive Caching (HCC), a multi-tiered architecture inspired by computer systems that enables the structural differentiation of experience over time. By dynamically distilling transient execution traces into stable knowledge and cross-task wisdom, HCC allows agents to decouple immediate execution from long-term experimental strategy, effectively overcoming the scaling limits of static context windows. In evaluations on OpenAI's MLE-Bench under 24-hour budgets, ML-Master 2.0 achieves a state-of-the-art medal rate of 56.44%. Our findings demonstrate that ultra-long-horizon autonomy provides a scalable blueprint for AI capable of autonomous exploration beyond human-precedent complexities.

Op Weg naar Ultra-Langetermijn Agent-gebaseerde Wetenschap: Cognitieve Accumulatie voor Machine Learning Engineering

Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering

Samenvatting

Support