ChatPaper.aiChatPaper

將開放式推理擴展至預測未來

Scaling Open-Ended Reasoning to Predict the Future

December 31, 2025
作者: Nikhil Chandak, Shashwat Goel, Ameya Prabhu, Moritz Hardt, Jonas Geiping
cs.AI

摘要

高風險決策涉及在未來不確定性下的推理。本研究旨在訓練語言模型對開放式預測問題進行預測。為擴充訓練數據規模,我們基於每日新聞報導的全球事件,採用全自動化精密策展流程合成新型預測問題。我們使用OpenForesight數據集對Qwen3思維模型進行訓練。為防止訓練與評估期間未來信息洩露,我們採用離線新聞語料庫進行數據生成及預測系統中的檢索。通過小型驗證集的指導,我們展示了檢索技術與改進的強化學習獎勵函數的優勢。在完成最終預測系統後,我們於2025年5月至8月期間進行了保留集測試。我們的專用模型OpenForecaster 8B在預測準確性、校準度與一致性方面均能媲美規模更大的專有模型。研究發現,預測訓練帶來的校準改進可泛化至多個主流基準測試。我們將所有模型、代碼及數據開源,以促進語言模型預測研究的廣泛開展。
English
High-stakes decision making involves reasoning under uncertainty about the future. In this work, we train language models to make predictions on open-ended forecasting questions. To scale up training data, we synthesize novel forecasting questions from global events reported in daily news, using a fully automated, careful curation recipe. We train the Qwen3 thinking models on our dataset, OpenForesight. To prevent leakage of future information during training and evaluation, we use an offline news corpus, both for data generation and retrieval in our forecasting system. Guided by a small validation set, we show the benefits of retrieval, and an improved reward function for reinforcement learning (RL). Once we obtain our final forecasting system, we perform held-out testing between May to August 2025. Our specialized model, OpenForecaster 8B, matches much larger proprietary models, with our training improving the accuracy, calibration, and consistency of predictions. We find calibration improvements from forecasting training generalize across popular benchmarks. We open-source all our models, code, and data to make research on language model forecasting broadly accessible.
PDF121January 2, 2026