ChatPaper.aiChatPaper

我們能否在執行機器學習代理前進行預測?

Can We Predict Before Executing Machine Learning Agents?

January 9, 2026
作者: Jingsheng Zheng, Jintian Zhang, Yujie Luo, Yuren Mao, Yunjun Gao, Lun Du, Huajun Chen, Ningyu Zhang
cs.AI

摘要

自主機器學習代理已徹底改變科學發現的範式,但其仍受制於「生成-執行-回饋」的固有框架。既有方法因嚴格依賴高成本的實體執行來驗證假設,存在嚴重的執行瓶頸。受世界模型啟發,我們通過內化執行先驗知識,以即時預測推理替代昂貴的運行時檢驗。本研究正式定義了「以數據為中心的解決方案偏好」任務,並構建包含18,438組對比數據的完整語料庫。實驗表明,大型語言模型在接收經過驗證的數據分析報告後,可展現顯著的預測能力,達成61.5%的準確率與穩健的置信度校準。最終,我們將此框架實現在FOREAGENT代理中,其採用「預測-驗證」循環機制,在收斂速度提升6倍的同時,以超越純執行基線方法+6%的表現達成突破。相關代碼與數據集將公開於:https://github.com/zjunlp/predict-before-execute。
English
Autonomous machine learning agents have revolutionized scientific discovery, yet they remain constrained by a Generate-Execute-Feedback paradigm. Previous approaches suffer from a severe Execution Bottleneck, as hypothesis evaluation relies strictly on expensive physical execution. To bypass these physical constraints, we internalize execution priors to substitute costly runtime checks with instantaneous predictive reasoning, drawing inspiration from World Models. In this work, we formalize the task of Data-centric Solution Preference and construct a comprehensive corpus of 18,438 pairwise comparisons. We demonstrate that LLMs exhibit significant predictive capabilities when primed with a Verified Data Analysis Report, achieving 61.5% accuracy and robust confidence calibration. Finally, we instantiate this framework in FOREAGENT, an agent that employs a Predict-then-Verify loop, achieving a 6x acceleration in convergence while surpassing execution-based baselines by +6%. Our code and dataset will be publicly available soon at https://github.com/zjunlp/predict-before-execute.
PDF191January 13, 2026