推理健身房：具备可验证奖励的强化学习推理环境

摘要

我们推出推理健身房（Reasoning Gym，简称RG），这是一个为强化学习提供可验证奖励的推理环境库。它包含超过100个数据生成器和验证器，涵盖代数、算术、计算、认知、几何、图论、逻辑及多种常见游戏等多个领域。其核心创新在于能够生成几乎无限且复杂度可调的训练数据，这与以往大多数固定不变的推理数据集形成鲜明对比。这种程序化生成方法使得我们能够在不同难度级别上进行持续评估。实验结果表明，RG在推理模型的评估与强化学习方面均展现出显著效果。

English

We introduce Reasoning Gym (RG), a library of reasoning environments for reinforcement learning with verifiable rewards. It provides over 100 data generators and verifiers spanning multiple domains including algebra, arithmetic, computation, cognition, geometry, graph theory, logic, and various common games. Its key innovation is the ability to generate virtually infinite training data with adjustable complexity, unlike most previous reasoning datasets, which are typically fixed. This procedural generation approach allows for continuous evaluation across varying difficulty levels. Our experimental results demonstrate the efficacy of RG in both evaluating and reinforcement learning of reasoning models.