SynLogic: 논리적 추론 및 그 이상을 학습하기 위한 검증 가능한 추론 데이터의 대규모 합성

초록

최근 OpenAI-o1과 DeepSeek R1과 같은 발전은 강화 학습(Reinforcement Learning, RL)이 대형 언어 모델(Large Language Models, LLMs)의 추론 능력을 향상시킬 수 있는 잠재력을 보여주었습니다. 오픈소스 복제 노력은 주로 수학 및 코딩 영역에 초점을 맞추고 있지만, 일반적인 추론 능력을 개발하기 위한 방법과 자원은 여전히 충분히 탐구되지 않고 있습니다. 이러한 격차는 부분적으로 RL에 적합한 다양하고 검증 가능한 추론 데이터를 수집하는 데 어려움이 있기 때문입니다. 우리는 논리적 추론이 일반적인 추론 능력을 개발하는 데 있어 핵심적이라고 가정합니다. 왜냐하면 논리는 추론의 기본 구성 요소이기 때문입니다. 이 연구에서 우리는 35가지 다양한 논리적 추론 작업을 포함하는 대규모 논리적 추론 데이터를 생성하는 데이터 합성 프레임워크 및 데이터셋인 SynLogic를 제시합니다. SynLogic 접근법은 조절 가능한 난이도와 양으로 데이터를 통제적으로 합성할 수 있게 합니다. 특히, 모든 예제는 간단한 규칙으로 검증할 수 있어 검증 가능한 보상을 통한 RL에 이상적으로 적합합니다. 우리의 실험에서, 우리는 7B 및 32B 모델을 기반으로 SynLogic 데이터셋에서의 RL 훈련의 효과를 검증합니다. SynLogic는 오픈소스 데이터셋 중에서 최고 수준의 논리적 추론 성능을 달성하며, BBEH에서 DeepSeek-R1-Distill-Qwen-32B를 6점 앞섭니다. 또한, SynLogic 데이터를 수학 및 코딩 작업과 혼합하면 이러한 영역의 훈련 효율성을 향상시키고 추론 일반화를 크게 개선합니다. 특히, 우리의 혼합 훈련 모델은 여러 벤치마크에서 DeepSeek-R1-Zero-Qwen-32B를 능가합니다. 이러한 결과는 SynLogic가 LLMs의 더 넓은 추론 능력을 발전시키는 데 있어 가치 있는 자원임을 입증합니다. 우리는 데이터 합성 파이프라인과 SynLogic 데이터셋을 https://github.com/MiniMax-AI/SynLogic에서 오픈소스로 공개합니다.

English

Recent advances such as OpenAI-o1 and DeepSeek R1 have demonstrated the potential of Reinforcement Learning (RL) to enhance reasoning abilities in Large Language Models (LLMs). While open-source replication efforts have primarily focused on mathematical and coding domains, methods and resources for developing general reasoning capabilities remain underexplored. This gap is partly due to the challenge of collecting diverse and verifiable reasoning data suitable for RL. We hypothesize that logical reasoning is critical for developing general reasoning capabilities, as logic forms a fundamental building block of reasoning. In this work, we present SynLogic, a data synthesis framework and dataset that generates diverse logical reasoning data at scale, encompassing 35 diverse logical reasoning tasks. The SynLogic approach enables controlled synthesis of data with adjustable difficulty and quantity. Importantly, all examples can be verified by simple rules, making them ideally suited for RL with verifiable rewards. In our experiments, we validate the effectiveness of RL training on the SynLogic dataset based on 7B and 32B models. SynLogic leads to state-of-the-art logical reasoning performance among open-source datasets, surpassing DeepSeek-R1-Distill-Qwen-32B by 6 points on BBEH. Furthermore, mixing SynLogic data with mathematical and coding tasks improves the training efficiency of these domains and significantly enhances reasoning generalization. Notably, our mixed training model outperforms DeepSeek-R1-Zero-Qwen-32B across multiple benchmarks. These findings position SynLogic as a valuable resource for advancing the broader reasoning capabilities of LLMs. We open-source both the data synthesis pipeline and the SynLogic dataset at https://github.com/MiniMax-AI/SynLogic.

SynLogic: 논리적 추론 및 그 이상을 학습하기 위한 검증 가능한 추론 데이터의 대규모 합성

SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond

초록

Support