추론 코어: 기호적 사전 학습 및 사후 학습을 위한 확장 가능한 절차적 데이터 생성 도구 모음

초록

검증 가능한 기호 데이터에 대한 훈련은 표준 사전 훈련 코퍼스가 제공하는 범위를 넘어 언어 모델의 추론 한계를 확장하는 유망한 방법입니다. 그러나 기존의 절차적 생성기들은 고정된 퍼즐이나 템플릿에 의존하는 경우가 많아 규모에 필요한 분포적 폭을 제공하지 못합니다. 우리는 핵심 형식 영역 전반에 걸쳐 검증 가능한 기호 추론 데이터를 절차적으로 생성하는 확장 가능한 도구 모음인 Reasoning Core를 소개합니다: 무작위화된 영역에 대한 PDDL 계획 수립, 등식을 포함한 1차 논리, 문맥 자유 문법 구문 분석 및 생성, 무작위 베이지안 네트워크에 대한 인과 추론, 그리고 방정식 시스템. 각 작업은 엄격한 검증을 위한 외부 솔버와 쌍을 이루며, 교육 과정 설계를 위한 지속적인 난이도 조절이 가능합니다. 예제는 선택적으로 솔버에서 도출된 추론 흔적을 포함할 수 있어, 최초 사전 훈련 단계부터 지도 학습이 가능하며, 동일한 인터페이스는 강화 학습을 위한 검증 가능한 보상 함수를 제공합니다. 우리의 실험은 Reasoning Core 데이터를 사전 훈련에 혼합하면 언어 모델링 품질을 유지하거나 약간 향상시키면서도 하류 추론 과제 성능을 개선함을 보여줍니다. 제로샷 평가는 이러한 작업들이 GPT-5와 같은 최첨단 모델에게도 도전적임을 확인합니다. 코드와 데이터는 MIT 라이선스 하에 공개되어 있습니다.

English

Training on verifiable symbolic data is a promising way to expand the reasoning frontier of language models beyond what standard pre-training corpora provide. Yet existing procedural generators often rely on fixed puzzles or templates and do not deliver the distributional breadth needed at scale. We introduce Reasoning Core, a scalable suite that procedurally generates verifiable symbolic reasoning data across core formal domains: PDDL planning over randomized domains, first-order logic with equality, context-free grammar parsing and generation, causal reasoning over random Bayesian networks, and systems of equations. Each task is paired with an external solver for rigorous verification and admits continuous difficulty control for curriculum design. Examples can optionally include solver-derived reasoning traces, enabling supervised training from the earliest pre-training stages, and the same interface provides verifiable reward functions for reinforcement learning. Our experiments show that mixing Reasoning Core data into pre-training improves downstream reasoning while preserving, or slightly improving, language modeling quality. Zero-shot evaluations confirm these tasks challenge frontier models such as GPT-5. The code and data are publicly available under the MIT license.

추론 코어: 기호적 사전 학습 및 사후 학습을 위한 확장 가능한 절차적 데이터 생성 도구 모음

Reasoning Core: A Scalable Procedural Data Generation Suite for Symbolic Pre-training and Post-Training

초록

Support