ChatPaper.aiChatPaper

推理核心:面向符号化预训练与后训练的可扩展程序化数据生成套件

Reasoning Core: A Scalable Procedural Data Generation Suite for Symbolic Pre-training and Post-Training

March 2, 2026
作者: Valentin Lacombe, Valentin Quesnel, Damien Sileo
cs.AI

摘要

基于可验证符号数据的训练,是突破标准预训练语料限制、拓展语言模型推理前沿的有效途径。然而现有的程序化生成方法往往依赖固定谜题或模板,难以实现规模化所需的分布广度。我们推出"推理核心"——一个可扩展的程序化生成套件,能在核心形式化领域生成可验证符号推理数据:包括随机化领域的PDDL规划、带等式的一阶逻辑、上下文无关文法解析与生成、随机贝叶斯网络的因果推理以及方程组求解。每个任务均配备外部求解器以实现严格验证,并支持难度连续调控以适应课程设计。样本可选择性地包含求解器推导的推理轨迹,从而支持从预训练早期阶段开始的监督学习,同一接口还可为强化学习提供可验证的奖励函数。实验表明,将推理核心数据混入预训练能提升下游推理能力,同时保持(甚至略微提升)语言建模质量。零样本评估证实这些任务对GPT-5等前沿模型构成挑战。代码与数据依据MIT许可证公开。
English
Training on verifiable symbolic data is a promising way to expand the reasoning frontier of language models beyond what standard pre-training corpora provide. Yet existing procedural generators often rely on fixed puzzles or templates and do not deliver the distributional breadth needed at scale. We introduce Reasoning Core, a scalable suite that procedurally generates verifiable symbolic reasoning data across core formal domains: PDDL planning over randomized domains, first-order logic with equality, context-free grammar parsing and generation, causal reasoning over random Bayesian networks, and systems of equations. Each task is paired with an external solver for rigorous verification and admits continuous difficulty control for curriculum design. Examples can optionally include solver-derived reasoning traces, enabling supervised training from the earliest pre-training stages, and the same interface provides verifiable reward functions for reinforcement learning. Our experiments show that mixing Reasoning Core data into pre-training improves downstream reasoning while preserving, or slightly improving, language modeling quality. Zero-shot evaluations confirm these tasks challenge frontier models such as GPT-5. The code and data are publicly available under the MIT license.
PDF31March 4, 2026