ChatPaper.aiChatPaper

邁向超長視野的自主科學:機器學習工程中的認知積累

Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering

January 15, 2026
作者: Xinyu Zhu, Yuzhu Cai, Zexi Liu, Bingyang Zheng, Cheng Wang, Rui Ye, Jiaao Chen, Hanrui Wang, Wei-Chen Wang, Yuzhi Zhang, Linfeng Zhang, Weinan E, Di Jin, Siheng Chen
cs.AI

摘要

人工智慧向代理科學的進展,目前正面臨超長視野自主性這一瓶頸挑戰——即如何在橫跨數日或數週的實驗週期中,維持策略連貫性與迭代修正的能力。儘管大型語言模型在短視野推理方面展現實力,但在現實研究的高維度、延遲回饋環境中,它們極易被執行細節淹沒,難以將稀疏回饋整合為連貫的長期指導。本文提出ML-Master 2.0,一款掌握超長視野機器學習工程的自主代理,該領域正是科學發現的典型縮影。透過將情境管理重新定義為認知累積的過程,我們引入分層認知快取(HCC)——一種受計算機系統啟發的多層架構,能實現經驗隨時間推移的結構化分層。通過將瞬時執行軌跡動態提煉為穩定知識與跨任務智慧,HCC使代理能分離即時執行與長期實驗策略,有效突破靜態情境窗口的規模限制。在OpenAI的MLE-Bench進行24小時預算評估中,ML-Master 2.0以56.44%的獎牌率達到最先進水平。我們的研究證實,超長視野自主性為人工智慧提供可擴展的藍圖,使其能自主探索超越人類先例複雜度的領域。
English
The advancement of artificial intelligence toward agentic science is currently bottlenecked by the challenge of ultra-long-horizon autonomy, the ability to sustain strategic coherence and iterative correction over experimental cycles spanning days or weeks. While Large Language Models (LLMs) have demonstrated prowess in short-horizon reasoning, they are easily overwhelmed by execution details in the high-dimensional, delayed-feedback environments of real-world research, failing to consolidate sparse feedback into coherent long-term guidance. Here, we present ML-Master 2.0, an autonomous agent that masters ultra-long-horizon machine learning engineering (MLE) which is a representative microcosm of scientific discovery. By reframing context management as a process of cognitive accumulation, our approach introduces Hierarchical Cognitive Caching (HCC), a multi-tiered architecture inspired by computer systems that enables the structural differentiation of experience over time. By dynamically distilling transient execution traces into stable knowledge and cross-task wisdom, HCC allows agents to decouple immediate execution from long-term experimental strategy, effectively overcoming the scaling limits of static context windows. In evaluations on OpenAI's MLE-Bench under 24-hour budgets, ML-Master 2.0 achieves a state-of-the-art medal rate of 56.44%. Our findings demonstrate that ultra-long-horizon autonomy provides a scalable blueprint for AI capable of autonomous exploration beyond human-precedent complexities.
PDF261January 17, 2026