推論コア：大規模言語モデルの記号推論のためのスケーラブルな強化学習環境

要旨

我々は、大規模言語モデル（LLMs）における基礎的な記号推論を進展させるために設計された、検証可能な報酬を伴う強化学習（RLVR）のための新しいスケーラブルな環境「Reasoning Core」を紹介する。既存のベンチマークがゲームや孤立したパズルに焦点を当てているのとは異なり、Reasoning CoreはPDDLプランニング、一階述語論理、文脈自由文法の構文解析、因果推論、システム方程式の解法といったコアな形式的領域にわたって問題を手続き的に生成する。この環境は、高汎用性の問題分布、外部ツールによる検証、継続的な難易度制御という主要な設計原則に基づいて構築されており、これらを組み合わせることで、実質的に無限の新しい訓練インスタンスを提供する。最先端のLLMsを用いた初期のゼロショット評価では、Reasoning Coreのタスクの難しさが確認され、将来のモデルの推論能力を向上させるための有望なリソースとして位置づけられる。

English

We introduce Reasoning Core, a new scalable environment for Reinforcement Learning with Verifiable Rewards (RLVR), designed to advance foundational symbolic reasoning in Large Language Models (LLMs). Unlike existing benchmarks that focus on games or isolated puzzles, Reasoning Core procedurally generates problems across core formal domains, including PDDL planning, first-order logic, context-free grammar parsing, causal reasoning, and system equation solving. The environment is built on key design principles of high-generality problem distributions, verification via external tools, and continuous difficulty control, which together provide a virtually infinite supply of novel training instances. Initial zero-shot evaluations with frontier LLMs confirm the difficulty of Reasoning Core's tasks, positioning it as a promising resource to improve the reasoning capabilities of future models.

推論コア：大規模言語モデルの記号推論のためのスケーラブルな強化学習環境

Reasoning Core: A Scalable RL Environment for LLM Symbolic Reasoning

要旨

Support