我们能否在执行机器学习智能体之前进行预测?
Can We Predict Before Executing Machine Learning Agents?
January 9, 2026
作者: Jingsheng Zheng, Jintian Zhang, Yujie Luo, Yuren Mao, Yunjun Gao, Lun Du, Huajun Chen, Ningyu Zhang
cs.AI
摘要
自主机器学习代理已彻底改变科学发现流程,但仍受限于"生成-执行-反馈"范式。现有方法因严格依赖昂贵的物理执行进行假设验证而面临严重的执行瓶颈。受世界模型启发,我们通过内化执行先验知识,用即时预测推理替代高成本的运行时检验。本研究正式定义了数据驱动解决方案优选任务,构建了包含18,438组对比数据的完整语料库。实验表明,大语言模型在获得经过验证的数据分析报告提示后,可展现出显著的预测能力,准确率达61.5%且具备稳健的置信度校准。最终我们实例化出FOREAGENT代理框架,采用"预测-验证"循环机制,在收敛速度提升6倍的同时,以超越纯执行基线6%的优势实现突破。相关代码与数据集即将发布于https://github.com/zjunlp/predict-before-execute。
English
Autonomous machine learning agents have revolutionized scientific discovery, yet they remain constrained by a Generate-Execute-Feedback paradigm. Previous approaches suffer from a severe Execution Bottleneck, as hypothesis evaluation relies strictly on expensive physical execution. To bypass these physical constraints, we internalize execution priors to substitute costly runtime checks with instantaneous predictive reasoning, drawing inspiration from World Models. In this work, we formalize the task of Data-centric Solution Preference and construct a comprehensive corpus of 18,438 pairwise comparisons. We demonstrate that LLMs exhibit significant predictive capabilities when primed with a Verified Data Analysis Report, achieving 61.5% accuracy and robust confidence calibration. Finally, we instantiate this framework in FOREAGENT, an agent that employs a Predict-then-Verify loop, achieving a 6x acceleration in convergence while surpassing execution-based baselines by +6%. Our code and dataset will be publicly available soon at https://github.com/zjunlp/predict-before-execute.