推論コア：記号的事前学習と事後学習のためのスケーラブルな手続き的データ生成スイート

要旨

検証可能な記号的データを用いた訓練は、言語モデルの推論フロンティアを標準的な事前学習コーパスが提供する範囲を超えて拡大する有望な手法である。しかし既存の手続き的生成手法は、固定されたパズルやテンプレートに依存することが多く、スケールに必要な分布的広がりを提供しない。本論文では「Reasoning Core」を提案する。これは中核的形式領域にわたる検証可能な記号的推論データを手続き的に生成するスケーラブルなスイートであり、ランダム化された領域におけるPDDLプランニング、等号付き一階述語論理、文脈自由文法の構文解析と生成、ランダムベイジアンネットワークを用いた因果推論、連立方程式を含む。各タスクは厳密な検証のための外部ソルバーとペア化され、カリキュラム設計のための連続的な難易度制御を可能とする。オプションとしてソルバー導出の推論トレースを含めることができ、事前学習の初期段階からの教師あり訓練を可能にする。同じインターフェースは強化学習のための検証可能な報酬関数も提供する。実験により、Reasoning Coreのデータを事前学習に混合することで、言語モデリング品質を維持あるいはわずかに向上させつつ、下流推論タスクが改善されることが示された。ゼロショット評価では、これらのタスクがGPT-5のようなフロンティアモデルに対しても挑戦的であることが確認された。コードとデータはMITライセンスの下で公開されている。

English

Training on verifiable symbolic data is a promising way to expand the reasoning frontier of language models beyond what standard pre-training corpora provide. Yet existing procedural generators often rely on fixed puzzles or templates and do not deliver the distributional breadth needed at scale. We introduce Reasoning Core, a scalable suite that procedurally generates verifiable symbolic reasoning data across core formal domains: PDDL planning over randomized domains, first-order logic with equality, context-free grammar parsing and generation, causal reasoning over random Bayesian networks, and systems of equations. Each task is paired with an external solver for rigorous verification and admits continuous difficulty control for curriculum design. Examples can optionally include solver-derived reasoning traces, enabling supervised training from the earliest pre-training stages, and the same interface provides verifiable reward functions for reinforcement learning. Our experiments show that mixing Reasoning Core data into pre-training improves downstream reasoning while preserving, or slightly improving, language modeling quality. Zero-shot evaluations confirm these tasks challenge frontier models such as GPT-5. The code and data are publicly available under the MIT license.

推論コア：記号的事前学習と事後学習のためのスケーラブルな手続き的データ生成スイート

Reasoning Core: A Scalable Procedural Data Generation Suite for Symbolic Pre-training and Post-Training

要旨

Support