SynLogic：大规模合成可验证推理数据，助力逻辑推理及更广泛领域的学习

摘要

近期，如OpenAI-o1和DeepSeek R1等进展展示了强化学习（RL）在提升大型语言模型（LLMs）推理能力方面的潜力。尽管开源复现工作主要集中在数学和编程领域，但开发通用推理能力的方法和资源仍显不足。这一空白部分源于收集适合RL的多样化且可验证的推理数据的挑战。我们假设逻辑推理对于发展通用推理能力至关重要，因为逻辑构成了推理的基本构建模块。在本研究中，我们提出了SynLogic，一个数据合成框架及数据集，能够大规模生成涵盖35种不同逻辑推理任务的多样化逻辑推理数据。SynLogic方法支持可控地合成难度和数量可调的数据。重要的是，所有示例均可通过简单规则验证，使其特别适合带有可验证奖励的RL训练。实验中，我们基于7B和32B模型验证了在SynLogic数据集上进行RL训练的有效性。SynLogic在开源数据集中实现了最先进的逻辑推理性能，在BBEH上超越DeepSeek-R1-Distill-Qwen-32B达6分。此外，将SynLogic数据与数学和编程任务混合，提升了这些领域的训练效率，并显著增强了推理的泛化能力。值得注意的是，我们的混合训练模型在多个基准测试中均优于DeepSeek-R1-Zero-Qwen-32B。这些发现确立了SynLogic作为推动LLMs更广泛推理能力发展的宝贵资源。我们已在https://github.com/MiniMax-AI/SynLogic开源了数据合成管道及SynLogic数据集。

English

Recent advances such as OpenAI-o1 and DeepSeek R1 have demonstrated the potential of Reinforcement Learning (RL) to enhance reasoning abilities in Large Language Models (LLMs). While open-source replication efforts have primarily focused on mathematical and coding domains, methods and resources for developing general reasoning capabilities remain underexplored. This gap is partly due to the challenge of collecting diverse and verifiable reasoning data suitable for RL. We hypothesize that logical reasoning is critical for developing general reasoning capabilities, as logic forms a fundamental building block of reasoning. In this work, we present SynLogic, a data synthesis framework and dataset that generates diverse logical reasoning data at scale, encompassing 35 diverse logical reasoning tasks. The SynLogic approach enables controlled synthesis of data with adjustable difficulty and quantity. Importantly, all examples can be verified by simple rules, making them ideally suited for RL with verifiable rewards. In our experiments, we validate the effectiveness of RL training on the SynLogic dataset based on 7B and 32B models. SynLogic leads to state-of-the-art logical reasoning performance among open-source datasets, surpassing DeepSeek-R1-Distill-Qwen-32B by 6 points on BBEH. Furthermore, mixing SynLogic data with mathematical and coding tasks improves the training efficiency of these domains and significantly enhances reasoning generalization. Notably, our mixed training model outperforms DeepSeek-R1-Zero-Qwen-32B across multiple benchmarks. These findings position SynLogic as a valuable resource for advancing the broader reasoning capabilities of LLMs. We open-source both the data synthesis pipeline and the SynLogic dataset at https://github.com/MiniMax-AI/SynLogic.

SynLogic：大规模合成可验证推理数据，助力逻辑推理及更广泛领域的学习

SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond

摘要

Support