SynLogic:大規模合成可驗證的推理數據,用於邏輯推理及其他領域的學習
SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond
May 26, 2025
作者: Junteng Liu, Yuanxiang Fan, Zhuo Jiang, Han Ding, Yongyi Hu, Chi Zhang, Yiqi Shi, Shitong Weng, Aili Chen, Shiqi Chen, Yunan Huang, Mozhi Zhang, Pengyu Zhao, Junjie Yan, Junxian He
cs.AI
摘要
近期如OpenAI-o1和DeepSeek R1等進展,已展示強化學習(RL)在提升大型語言模型(LLMs)推理能力方面的潛力。儘管開源複製工作主要集中於數學和編程領域,開發通用推理能力的方法與資源仍未被充分探索。這一空白部分歸因於收集適合RL的多樣化且可驗證的推理數據的挑戰。我們假設邏輯推理對於開發通用推理能力至關重要,因為邏輯構成了推理的基本構建塊。在本研究中,我們介紹了SynLogic,一個數據合成框架及數據集,它能夠大規模生成多樣化的邏輯推理數據,涵蓋35種不同的邏輯推理任務。SynLogic方法允許控制數據的合成,包括難度和數量的可調節性。重要的是,所有示例均可通過簡單規則驗證,使其非常適合於帶有可驗證獎勵的RL訓練。在我們的實驗中,我們基於7B和32B模型驗證了在SynLogic數據集上進行RL訓練的有效性。SynLogic在開源數據集中達到了領先的邏輯推理性能,在BBEH上超越了DeepSeek-R1-Distill-Qwen-32B 6分。此外,將SynLogic數據與數學和編程任務混合,不僅提高了這些領域的訓練效率,還顯著增強了推理的泛化能力。值得注意的是,我們的混合訓練模型在多個基準測試中均優於DeepSeek-R1-Zero-Qwen-32B。這些發現使SynLogic成為推進LLMs更廣泛推理能力的寶貴資源。我們在https://github.com/MiniMax-AI/SynLogic開源了數據合成管道及SynLogic數據集。
English
Recent advances such as OpenAI-o1 and DeepSeek R1 have demonstrated the
potential of Reinforcement Learning (RL) to enhance reasoning abilities in
Large Language Models (LLMs). While open-source replication efforts have
primarily focused on mathematical and coding domains, methods and resources for
developing general reasoning capabilities remain underexplored. This gap is
partly due to the challenge of collecting diverse and verifiable reasoning data
suitable for RL. We hypothesize that logical reasoning is critical for
developing general reasoning capabilities, as logic forms a fundamental
building block of reasoning. In this work, we present SynLogic, a data
synthesis framework and dataset that generates diverse logical reasoning data
at scale, encompassing 35 diverse logical reasoning tasks. The SynLogic
approach enables controlled synthesis of data with adjustable difficulty and
quantity. Importantly, all examples can be verified by simple rules, making
them ideally suited for RL with verifiable rewards. In our experiments, we
validate the effectiveness of RL training on the SynLogic dataset based on 7B
and 32B models. SynLogic leads to state-of-the-art logical reasoning
performance among open-source datasets, surpassing DeepSeek-R1-Distill-Qwen-32B
by 6 points on BBEH. Furthermore, mixing SynLogic data with mathematical and
coding tasks improves the training efficiency of these domains and
significantly enhances reasoning generalization. Notably, our mixed training
model outperforms DeepSeek-R1-Zero-Qwen-32B across multiple benchmarks. These
findings position SynLogic as a valuable resource for advancing the broader
reasoning capabilities of LLMs. We open-source both the data synthesis pipeline
and the SynLogic dataset at https://github.com/MiniMax-AI/SynLogic.Summary
AI-Generated Summary