ChatPaper.aiChatPaper

推理核心:面向符号化预训练与微调的可扩展程序化数据生成套件

Reasoning Core: A Scalable Procedural Data Generation Suite for Symbolic Pre-training and Post-Training

March 2, 2026
作者: Valentin Lacombe, Valentin Quesnel, Damien Sileo
cs.AI

摘要

基於可驗證符號數據的訓練是一種極具前景的方法,能將語言模型的推理邊界擴展至標準預訓練語料庫無法覆蓋的領域。然而現有的程序化生成器通常依賴固定謎題或模板,難以實現大規模所需的分布廣度。我們推出「推理核心」——一個可擴展的程序化生成套件,能夠在核心形式化領域生成可驗證的符號推理數據:包括隨機領域的PDDL規劃、帶等式的一階邏輯、上下文無關文法解析與生成、隨機貝葉斯網絡的因果推理以及方程組求解。每項任務均配備外部求解器以實現嚴格驗證,並支持課程設計所需的連續難度調控。樣本可選包含求解器推導的推理軌跡,從而實現從預訓練初始階段的監督學習,同一接口還可為強化學習提供可驗證的獎勵函數。實驗表明,將推理核心數據混入預訓練能提升下游推理能力,同時保持(甚至略微提升)語言建模質量。零樣本評估證實這些任務對GPT-5等前沿模型構成挑戰。代碼與數據已根據MIT許可協議開源發布。
English
Training on verifiable symbolic data is a promising way to expand the reasoning frontier of language models beyond what standard pre-training corpora provide. Yet existing procedural generators often rely on fixed puzzles or templates and do not deliver the distributional breadth needed at scale. We introduce Reasoning Core, a scalable suite that procedurally generates verifiable symbolic reasoning data across core formal domains: PDDL planning over randomized domains, first-order logic with equality, context-free grammar parsing and generation, causal reasoning over random Bayesian networks, and systems of equations. Each task is paired with an external solver for rigorous verification and admits continuous difficulty control for curriculum design. Examples can optionally include solver-derived reasoning traces, enabling supervised training from the earliest pre-training stages, and the same interface provides verifiable reward functions for reinforcement learning. Our experiments show that mixing Reasoning Core data into pre-training improves downstream reasoning while preserving, or slightly improving, language modeling quality. Zero-shot evaluations confirm these tasks challenge frontier models such as GPT-5. The code and data are publicly available under the MIT license.
PDF31March 4, 2026