透過研究框架將AI科學家的研究綜合與驗證外化

摘要

AI系統能逐漸自動化科學工作流程，但連結先前證據、生成想法、實驗與最終主張的推論過程，往往仍隱含於模型內部推論中。我們在此介紹 Xcientist，這是一個研究框架，能將研究綜合與實驗驗證外化為可檢視、由合約規範的流程。Xcientist 將文獻證據、想法狀態、實作方案、消融記錄與修復軌跡組織為持久的研究產出，使生成的機制能被奠基、執行、測試與修正，且不喪失其證據基礎。我們將「主張漂移」識別為自動化研究中的一種失敗模式，即可運行的產出不再支持最初宣稱的機制。在無訓練記憶系統、圖結構交通預測與多尺度物理資訊神經網路等領域中，Xcientist 保留了從問題制定到機制設計、驗證與有限度修正的可追溯軌跡。這些結果顯示，評估 AI 科學家時不應僅依據最終產出，也應檢視其綜合與驗證過程是否可歸因、可檢視且具科學問責性。

English

AI systems can increasingly automate scientific workflows, but the reasoning that links prior evidence, generated ideas, experiments and final claims often remains implicit inside model inference. Here we introduce Xcientist, a research harness that externalizes research synthesis and experimental validation into inspectable, contract-governed processes. Xcientist organizes literature evidence, idea states, implementation plans, ablation records and repair traces as persistent research artifacts, so that generated mechanisms can be grounded, executed, tested and revised without losing their evidential basis. We identify claim drift as a failure mode of automated research, where runnable artifacts no longer support the mechanism originally claimed. Across training-free memory systems, graph-structured traffic forecasting and multi-scale physics-informed neural networks, Xcientist preserves traceable trajectories from problem formulation to mechanism design, validation and bounded revision. These results suggest that AI scientists should be evaluated not only by their final artifacts, but by whether their synthesis and validation processes remain attributable, inspectable and scientifically accountable.