利用人工智慧預測科學進展
Forecasting Scientific Progress with Artificial Intelligence
May 21, 2026
作者: Sean Wu, Pan Lu, Yupeng Chen, Jonathan Bragg, Yutaro Yamada, Peter Clark, David Clifton, Philip Torr, James Zou, Junchi Yu
cs.AI
摘要
人工智慧(AI)正日益融入科學發現的過程,但其是否能預測科學進展仍屬未知。為探討此問題,我們提出一個基於時間錨定的評估框架,在可控知識限制下預測科學進展。我們介紹CUSP(截止條件限制下的未知科學進展),這是一個跨學科、事件層級的基準,透過可行性評估、機制推理、生成式解決方案設計及時間預測,評估AI系統的科學預測能力。在4,760個科學事件中,我們觀察到當前前沿模型存在系統性且依領域而定的限制。雖然模型能從競爭選項中辨識出合理的研究方向,但它們無法可靠地預測科學進展是否會實現,並系統性地錯誤估計其發生時間。各領域的表現高度異質,其中AI進展的時間預測性優於生物學、化學與物理學。模型對事件發生於訓練截止時間前後的敏感度極低,顯示此限制無法僅以訓練資料中的知識暴露解釋。在受控資訊存取下,額外的截止前知識能提升表現,但無法彌補與完整資訊情境的差距,而此差距在高引用進展中更為顯著。模型亦展現系統性的過度自信與強烈的回應偏誤,顯示不確定的估計不可靠。整體而言,當前AI系統在預測科學進展的工具角色上仍顯不足。既有知識的存取並未轉化為可靠的預測,且模型從事後資訊中獲益的程度,遠高於前瞻性的預測。
English
Artificial intelligence (AI) is increasingly embedded in scientific discovery, yet whether it can anticipate scientific progress remains unclear. To study this question, we introduce a temporally grounded evaluation framework for forecasting scientific progress under controlled knowledge constraints. We present CUSP (Cutoff-conditioned Unseen Scientific Progress), a multi-disciplinary and event-level benchmark that evaluates scientific forecasting in AI systems through feasibility assessment, mechanistic reasoning, generative solution design, and temporal prediction. Across 4,760 scientific events, we observe systematic and domain-dependent limitations in current frontier models. While models can identify plausible research directions from competing candidates, they fail to reliably predict whether scientific advances will be realized and systematically misestimate when they will occur. Performance is highly heterogeneous across domains, with the timing of AI progress more predictable than advances in biology, chemistry, and physics. Performance is largely insensitive to whether events occur before or after the training cutoff, suggesting these limitations cannot be explained solely by knowledge exposure in training data. Under controlled information access, additional pre-cutoff knowledge improves performance but does not close the gap to full-information settings, which becomes more pronounced for high-citation advances. Models also exhibit systematic overconfidence and strong response biases, indicating unreliable uncertainty estimation. Taken together, current AI systems fall short as predictive tools for scientific progress. Access to prior knowledge does not translate into reliable forecasting, and performance benefits more from post-event information than from forward-looking prediction.