推論ジム：検証可能な報酬を伴う強化学習のための推論環境

要旨

我々は、検証可能な報酬を伴う強化学習のための推論環境ライブラリ「Reasoning Gym（RG）」を紹介する。RGは、代数、算術、計算、認知、幾何学、グラフ理論、論理、および様々な一般的なゲームを含む複数の領域にわたる100以上のデータ生成器と検証器を提供する。その主な革新点は、従来の推論データセットが通常固定されているのとは異なり、調整可能な複雑さを持つ事実上無限の訓練データを生成する能力である。この手続き的生成アプローチにより、様々な難易度レベルにわたる継続的な評価が可能となる。実験結果は、RGが推論モデルの評価と強化学習の両方において有効であることを示している。

English

We introduce Reasoning Gym (RG), a library of reasoning environments for reinforcement learning with verifiable rewards. It provides over 100 data generators and verifiers spanning multiple domains including algebra, arithmetic, computation, cognition, geometry, graph theory, logic, and various common games. Its key innovation is the ability to generate virtually infinite training data with adjustable complexity, unlike most previous reasoning datasets, which are typically fixed. This procedural generation approach allows for continuous evaluation across varying difficulty levels. Our experimental results demonstrate the efficacy of RG in both evaluating and reinforcement learning of reasoning models.