高品質なCoTデータ生成の再考：LLM適応型質問難易度評価の観点から

要旨

最近、DeepSeek-R1（671B）（DeepSeek-AI et al., 2025）は複雑なタスクにおける優れた推論能力を実証し、その方法論を公開しました。これにより、小規模な大規模言語モデル（LLM）の推論能力を刺激するための高品質な連鎖思考（CoT）データが提供される可能性があります。異なるLLM向けに高品質なCoTデータを生成するため、我々はLLM適応型の質問難易度レベルを用いた効率的なCoTデータ生成方法を模索しています。まず、LLM自身の推論能力に基づいて質問の難易度を評価し、LLM適応型の質問データベースを構築します。次に、質問の難易度レベルに基づいて問題データベースをサンプリングし、DeepSeek-R1（671B）（DeepSeek-AI et al., 2025）を使用して対応する高品質なCoTデータと正解を生成します。LLM適応型の難易度レベルを持つCoTデータの構築により、データ生成のコストを大幅に削減し、モデルの教師あり微調整（SFT）の効率を向上させました。最後に、複雑な数学競技やコード生成タスクの分野において、提案手法の有効性と汎用性を検証しました。特に、わずか2kの高品質な数学CoTデータを用いて、我々のZMath-32Bは数学推論タスクにおいてDeepSeek-Distill-32Bを上回りました。同様に、わずか2kの高品質なコードCoTデータを用いて、ZCode-32Bはコード推論タスクにおいてDeepSeek-Distill-32Bを上回りました。

English

Recently, DeepSeek-R1 (671B) (DeepSeek-AIet al., 2025) has demonstrated its excellent reasoning ability in complex tasks and has publiclyshared its methodology. This provides potentially high-quality chain-of-thought (CoT) data for stimulating the reasoning abilities of small-sized large language models (LLMs). To generate high-quality CoT data for different LLMs, we seek an efficient method for generating high-quality CoT data with LLM-Adaptive questiondifficulty levels. First, we grade the difficulty of the questions according to the reasoning ability of the LLMs themselves and construct a LLM-Adaptive question database. Second, we sample the problem database based on a distribution of difficulty levels of the questions and then use DeepSeek-R1 (671B) (DeepSeek-AI et al., 2025) to generate the corresponding high-quality CoT data with correct answers. Thanks to the construction of CoT data with LLM-Adaptive difficulty levels, we have significantly reduced the cost of data generation and enhanced the efficiency of model supervised fine-tuning (SFT). Finally, we have validated the effectiveness and generalizability of the proposed method in the fields of complex mathematical competitions and code generation tasks. Notably, with only 2k high-quality mathematical CoT data, our ZMath-32B surpasses DeepSeek-Distill-32B in math reasoning task. Similarly, with only 2k high-quality code CoT data, our ZCode-32B surpasses DeepSeek-Distill-32B in code reasoning tasks.

高品質なCoTデータ生成の再考：LLM適応型質問難易度評価の観点から

Rethinking the Generation of High-Quality CoT Data from the Perspective of LLM-Adaptive Question Difficulty Grading

要旨

Support