고품질 CoT 데이터 생성 재고: LLM 적응형 질문 난이도 등급화 관점에서

초록

최근 DeepSeek-R1 (671B) (DeepSeek-AI 외, 2025)는 복잡한 작업에서 탁월한 추론 능력을 보여주었으며, 그 방법론을 공개했습니다. 이는 소형 대규모 언어 모델(LLM)의 추론 능력을 자극하기 위한 잠재적으로 고품질의 사고 연쇄(CoT) 데이터를 제공합니다. 다양한 LLM을 위한 고품질 CoT 데이터를 생성하기 위해, 우리는 LLM에 적응형 질문 난이도 수준을 가진 고품질 CoT 데이터를 생성하는 효율적인 방법을 모색했습니다. 먼저, LLM 자체의 추론 능력에 따라 질문의 난이도를 등급화하고, LLM에 적응형 질문 데이터베이스를 구축했습니다. 둘째, 질문의 난이도 분포를 기반으로 문제 데이터베이스를 샘플링한 후, DeepSeek-R1 (671B) (DeepSeek-AI 외, 2025)를 사용하여 정답과 함께 해당하는 고품질 CoT 데이터를 생성했습니다. LLM에 적응형 난이도 수준의 CoT 데이터 구축 덕분에, 우리는 데이터 생성 비용을 크게 절감하고 모델의 지도 미세 조정(SFT) 효율성을 향상시켰습니다. 마지막으로, 우리는 복잡한 수학 경시대회 및 코드 생성 작업 분야에서 제안된 방법의 효과성과 일반화 가능성을 검증했습니다. 특히, 단 2k의 고품질 수학 CoT 데이터만으로도 우리의 ZMath-32B는 수학 추론 작업에서 DeepSeek-Distill-32B를 능가했습니다. 마찬가지로, 단 2k의 고품질 코드 CoT 데이터만으로도 우리의 ZCode-32B는 코드 추론 작업에서 DeepSeek-Distill-32B를 능가했습니다.

English

Recently, DeepSeek-R1 (671B) (DeepSeek-AIet al., 2025) has demonstrated its excellent reasoning ability in complex tasks and has publiclyshared its methodology. This provides potentially high-quality chain-of-thought (CoT) data for stimulating the reasoning abilities of small-sized large language models (LLMs). To generate high-quality CoT data for different LLMs, we seek an efficient method for generating high-quality CoT data with LLM-Adaptive questiondifficulty levels. First, we grade the difficulty of the questions according to the reasoning ability of the LLMs themselves and construct a LLM-Adaptive question database. Second, we sample the problem database based on a distribution of difficulty levels of the questions and then use DeepSeek-R1 (671B) (DeepSeek-AI et al., 2025) to generate the corresponding high-quality CoT data with correct answers. Thanks to the construction of CoT data with LLM-Adaptive difficulty levels, we have significantly reduced the cost of data generation and enhanced the efficiency of model supervised fine-tuning (SFT). Finally, we have validated the effectiveness and generalizability of the proposed method in the fields of complex mathematical competitions and code generation tasks. Notably, with only 2k high-quality mathematical CoT data, our ZMath-32B surpasses DeepSeek-Distill-32B in math reasoning task. Similarly, with only 2k high-quality code CoT data, our ZCode-32B surpasses DeepSeek-Distill-32B in code reasoning tasks.

고품질 CoT 데이터 생성 재고: LLM 적응형 질문 난이도 등급화 관점에서

Rethinking the Generation of High-Quality CoT Data from the Perspective of LLM-Adaptive Question Difficulty Grading

초록

Support