ChatPaper.aiChatPaper

将开放式推理扩展至未来预测

Scaling Open-Ended Reasoning to Predict the Future

December 31, 2025
作者: Nikhil Chandak, Shashwat Goel, Ameya Prabhu, Moritz Hardt, Jonas Geiping
cs.AI

摘要

高风险决策涉及对未来不确定性的推理。本研究致力于训练语言模型对开放式预测问题做出预判。为扩大训练数据规模,我们基于每日新闻中的全球事件报道,采用全自动化的精细筛选方案,合成了新颖的预测问题。我们在自建数据集OpenForesight上对Qwen3思维模型进行训练。为防止训练和评估过程中未来信息泄露,我们使用离线新闻语料库进行数据生成和预测系统的信息检索。通过小型验证集的指导,我们证明了检索技术以及改进的强化学习奖励函数的优势。最终预测系统构建完成后,我们在2025年5月至8月期间进行了封闭测试。专业模型OpenForecaster 8B的表现与规模更大的专有模型相当,其训练过程显著提升了预测的准确性、校准度和一致性。研究发现,预测训练带来的校准改进可泛化至多个主流基准测试。我们已将全部模型、代码和数据集开源,以推动语言模型预测研究的广泛普及。
English
High-stakes decision making involves reasoning under uncertainty about the future. In this work, we train language models to make predictions on open-ended forecasting questions. To scale up training data, we synthesize novel forecasting questions from global events reported in daily news, using a fully automated, careful curation recipe. We train the Qwen3 thinking models on our dataset, OpenForesight. To prevent leakage of future information during training and evaluation, we use an offline news corpus, both for data generation and retrieval in our forecasting system. Guided by a small validation set, we show the benefits of retrieval, and an improved reward function for reinforcement learning (RL). Once we obtain our final forecasting system, we perform held-out testing between May to August 2025. Our specialized model, OpenForecaster 8B, matches much larger proprietary models, with our training improving the accuracy, calibration, and consistency of predictions. We find calibration improvements from forecasting training generalize across popular benchmarks. We open-source all our models, code, and data to make research on language model forecasting broadly accessible.
PDF121January 2, 2026