SynLogic: 論理的推論とその先を学ぶための検証可能な推論データの大規模合成

要旨

OpenAI-o1やDeepSeek R1などの最近の進展は、大規模言語モデル（LLMs）の推論能力を強化するための強化学習（RL）の可能性を示しています。オープンソースの再現努力は主に数学やコーディング領域に焦点を当ててきましたが、一般的な推論能力を開発するための方法やリソースはまだ十分に探求されていません。このギャップは、RLに適した多様で検証可能な推論データを収集する難しさに一部起因しています。私たちは、論理的推論が一般的な推論能力を開発する上で重要であると仮定しています。なぜなら、論理は推論の基本的な構成要素だからです。本研究では、35の多様な論理的推論タスクを含む、大規模な論理的推論データを生成するデータ合成フレームワークおよびデータセットであるSynLogicを紹介します。SynLogicアプローチは、難易度と量を調整可能なデータの制御された合成を可能にします。重要なことに、すべての例は単純なルールで検証可能であり、検証可能な報酬を伴うRLに理想的に適しています。実験では、7Bおよび32Bモデルに基づいて、SynLogicデータセットでのRLトレーニングの有効性を検証しました。SynLogicは、オープンソースデータセットの中で最先端の論理的推論性能を達成し、DeepSeek-R1-Distill-Qwen-32BをBBEHで6ポイント上回りました。さらに、SynLogicデータを数学やコーディングタスクと混合することで、これらの領域のトレーニング効率が向上し、推論の一般化が大幅に強化されました。特に、私たちの混合トレーニングモデルは、複数のベンチマークでDeepSeek-R1-Zero-Qwen-32Bを上回りました。これらの発見は、SynLogicをLLMsのより広範な推論能力を進めるための貴重なリソースとして位置づけます。データ合成パイプラインとSynLogicデータセットをhttps://github.com/MiniMax-AI/SynLogicでオープンソース化しています。

English

Recent advances such as OpenAI-o1 and DeepSeek R1 have demonstrated the potential of Reinforcement Learning (RL) to enhance reasoning abilities in Large Language Models (LLMs). While open-source replication efforts have primarily focused on mathematical and coding domains, methods and resources for developing general reasoning capabilities remain underexplored. This gap is partly due to the challenge of collecting diverse and verifiable reasoning data suitable for RL. We hypothesize that logical reasoning is critical for developing general reasoning capabilities, as logic forms a fundamental building block of reasoning. In this work, we present SynLogic, a data synthesis framework and dataset that generates diverse logical reasoning data at scale, encompassing 35 diverse logical reasoning tasks. The SynLogic approach enables controlled synthesis of data with adjustable difficulty and quantity. Importantly, all examples can be verified by simple rules, making them ideally suited for RL with verifiable rewards. In our experiments, we validate the effectiveness of RL training on the SynLogic dataset based on 7B and 32B models. SynLogic leads to state-of-the-art logical reasoning performance among open-source datasets, surpassing DeepSeek-R1-Distill-Qwen-32B by 6 points on BBEH. Furthermore, mixing SynLogic data with mathematical and coding tasks improves the training efficiency of these domains and significantly enhances reasoning generalization. Notably, our mixed training model outperforms DeepSeek-R1-Zero-Qwen-32B across multiple benchmarks. These findings position SynLogic as a valuable resource for advancing the broader reasoning capabilities of LLMs. We open-source both the data synthesis pipeline and the SynLogic dataset at https://github.com/MiniMax-AI/SynLogic.

SynLogic: 論理的推論とその先を学ぶための検証可能な推論データの大規模合成

SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond

要旨

Support