迈向超长视野的智能科学研究:面向机器学习工程的认知积累
Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering
January 15, 2026
作者: Xinyu Zhu, Yuzhu Cai, Zexi Liu, Bingyang Zheng, Cheng Wang, Rui Ye, Jiaao Chen, Hanrui Wang, Wei-Chen Wang, Yuzhi Zhang, Linfeng Zhang, Weinan E, Di Jin, Siheng Chen
cs.AI
摘要
人工智能向代理化科学发展的进程,目前正受限于超长程自主性这一关键挑战——即在持续数日或数周的实验周期中保持战略连贯性与迭代修正的能力。尽管大语言模型在短程推理中展现出卓越能力,但在现实研究的高维度、延迟反馈环境中,它们易被执行细节淹没,难以将稀疏反馈整合为连贯的长期指导。本文提出ML-Master 2.0,一种掌握超长程机器学习工程的自主智能体,该领域是科学发现的典型缩影。通过将情境管理重构为认知积累过程,我们引入受计算机系统启发的分层认知缓存(HCC)架构,实现经验随时间推移的结构化分层。该架构通过将瞬时执行轨迹动态提炼为稳定知识与跨任务智慧,使智能体能够将即时执行与长期实验策略解耦,有效突破静态上下文窗口的扩展限制。在OpenAI的MLE-Bench上进行的24小时预算评估中,ML-Master 2.0实现了56.44%的顶尖奖牌获得率。我们的研究表明,超长程自主性为人工智能提供了可扩展的蓝图,使其能够超越人类既有经验的复杂度进行自主探索。
English
The advancement of artificial intelligence toward agentic science is currently bottlenecked by the challenge of ultra-long-horizon autonomy, the ability to sustain strategic coherence and iterative correction over experimental cycles spanning days or weeks. While Large Language Models (LLMs) have demonstrated prowess in short-horizon reasoning, they are easily overwhelmed by execution details in the high-dimensional, delayed-feedback environments of real-world research, failing to consolidate sparse feedback into coherent long-term guidance. Here, we present ML-Master 2.0, an autonomous agent that masters ultra-long-horizon machine learning engineering (MLE) which is a representative microcosm of scientific discovery. By reframing context management as a process of cognitive accumulation, our approach introduces Hierarchical Cognitive Caching (HCC), a multi-tiered architecture inspired by computer systems that enables the structural differentiation of experience over time. By dynamically distilling transient execution traces into stable knowledge and cross-task wisdom, HCC allows agents to decouple immediate execution from long-term experimental strategy, effectively overcoming the scaling limits of static context windows. In evaluations on OpenAI's MLE-Bench under 24-hour budgets, ML-Master 2.0 achieves a state-of-the-art medal rate of 56.44%. Our findings demonstrate that ultra-long-horizon autonomy provides a scalable blueprint for AI capable of autonomous exploration beyond human-precedent complexities.