利用研究框架外化AI科学家的研究综合与验证
Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness
June 17, 2026
作者: Zijian Wang, Hanqi Li, Ziyue Yang, Zijian Hu, Shenghan Zuo, Yunzhe Zhang, Da Ma, Danyu Luo, Chenrun Wang, Jing Peng, Tiancheng Huang, Sijia Guo, Huayang Wang, Zichen Zhu, Senyu Han, Yilu Cao, Kai Yu, Lu Chen
cs.AI
摘要
AI系统正日益自动化科学工作流程,但将先前证据、生成的想法、实验与最终结论联系起来的推理过程往往仍隐含在模型内部。本文提出Xcientist——一种将研究综合与实验验证外化为可审查、受合约约束流程的研究框架。Xcientist将文献证据、想法状态、实施方案、消融记录和修正痕迹组织为持久化研究工件,使得生成机制能够在不丢失证据基础的前提下被落地、执行、测试和修订。我们将"声明漂移"识别为自动化研究的一种失效模式,即可运行的工件不再支持最初声称的机制。在免训练记忆系统、图结构交通预测及多尺度物理信息神经网络中,Xcientist保留了从问题定义到机制设计、验证及有限修正的全流程可追溯路径。这些结果表明,评估AI科学家不应仅依据其最终工件,更应考察其综合与验证过程是否保持可归因、可审查且具备科学可问责性。
English
AI systems can increasingly automate scientific workflows, but the reasoning that links prior evidence, generated ideas, experiments and final claims often remains implicit inside model inference. Here we introduce Xcientist, a research harness that externalizes research synthesis and experimental validation into inspectable, contract-governed processes. Xcientist organizes literature evidence, idea states, implementation plans, ablation records and repair traces as persistent research artifacts, so that generated mechanisms can be grounded, executed, tested and revised without losing their evidential basis. We identify claim drift as a failure mode of automated research, where runnable artifacts no longer support the mechanism originally claimed. Across training-free memory systems, graph-structured traffic forecasting and multi-scale physics-informed neural networks, Xcientist preserves traceable trajectories from problem formulation to mechanism design, validation and bounded revision. These results suggest that AI scientists should be evaluated not only by their final artifacts, but by whether their synthesis and validation processes remain attributable, inspectable and scientifically accountable.