推理核心：面向大语言模型符号推理的可扩展强化学习环境

摘要

我们推出“推理核心”（Reasoning Core），这是一个为可验证奖励强化学习（RLVR）设计的新型可扩展环境，旨在提升大型语言模型（LLMs）的基础符号推理能力。与现有专注于游戏或孤立谜题的基准不同，推理核心程序化地生成涵盖核心形式领域的问题，包括PDDL规划、一阶逻辑、上下文无关文法解析、因果推理以及系统方程求解。该环境建立在关键设计原则之上：高通用性的问题分布、通过外部工具进行验证以及持续难度控制，这些原则共同提供了近乎无限的新颖训练实例。对前沿LLMs的初步零样本评估证实了推理核心任务的挑战性，使其成为提升未来模型推理能力的有力资源。

English

We introduce Reasoning Core, a new scalable environment for Reinforcement Learning with Verifiable Rewards (RLVR), designed to advance foundational symbolic reasoning in Large Language Models (LLMs). Unlike existing benchmarks that focus on games or isolated puzzles, Reasoning Core procedurally generates problems across core formal domains, including PDDL planning, first-order logic, context-free grammar parsing, causal reasoning, and system equation solving. The environment is built on key design principles of high-generality problem distributions, verification via external tools, and continuous difficulty control, which together provide a virtually infinite supply of novel training instances. Initial zero-shot evaluations with frontier LLMs confirm the difficulty of Reasoning Core's tasks, positioning it as a promising resource to improve the reasoning capabilities of future models.

推理核心：面向大语言模型符号推理的可扩展强化学习环境

Reasoning Core: A Scalable RL Environment for LLM Symbolic Reasoning

摘要

Support