ChatPaper.aiChatPaper

并非所有正确答案都同等重要:为何蒸馏源的选择至关重要

Not All Correct Answers Are Equal: Why Your Distillation Source Matters

May 20, 2025
作者: Xiaoyu Tian, Yunjie Ji, Haotian Wang, Shuaiting Chen, Sitong Zhao, Yiping Peng, Han Zhao, Xiangang Li
cs.AI

摘要

蒸馏技术已成为提升开源语言模型推理能力的一种实用且有效的方法。在本研究中,我们通过从三个顶尖教师模型——AM-Thinking-v1、Qwen3-235B-A22B和DeepSeek-R1——收集共享语料库中189万条查询的已验证输出,开展了一项大规模推理数据蒸馏的实证研究。我们构建了三个并行数据集,并分析了它们的分布情况,发现AM-Thinking-v1蒸馏的数据展现出更大的令牌长度多样性和更低的困惑度。基于每个数据集训练的学生模型在包括AIME2024、AIME2025、MATH500和LiveCodeBench在内的推理基准上进行了评估。基于AM的模型始终表现最佳(例如,AIME2024上84.3分,AIME2025上72.2分,MATH500上98.4分,LiveCodeBench上65.9分),并展示了适应性输出行为——对更困难的任务生成更长的响应,而对较简单的任务则生成较短的响应。这些发现凸显了高质量、已验证推理轨迹的价值。我们发布了AM-Thinking-v1和Qwen3-235B-A22B的蒸馏数据集,以支持未来关于开放且高性能的推理导向语言模型的研究。这些数据集已在Hugging Face上公开提供:\href{https://huggingface.co/datasets/a-m-team/AM-Thinking-v1-Distilled{AM-Thinking-v1-Distilled}, https://huggingface.co/datasets/a-m-team/AM-Qwen3-Distilled{AM-Qwen3-Distilled}.}。
English
Distillation has emerged as a practical and effective approach to enhance the reasoning capabilities of open-source language models. In this work, we conduct a large-scale empirical study on reasoning data distillation by collecting verified outputs from three state-of-the-art teacher models-AM-Thinking-v1, Qwen3-235B-A22B, and DeepSeek-R1-on a shared corpus of 1.89 million queries. We construct three parallel datasets and analyze their distributions, revealing that AM-Thinking-v1-distilled data exhibits greater token length diversity and lower perplexity. Student models trained on each dataset are evaluated on reasoning benchmarks including AIME2024, AIME2025, MATH500, and LiveCodeBench. The AM-based model consistently achieves the best performance (e.g., 84.3 on AIME2024, 72.2 on AIME2025, 98.4 on MATH500, and 65.9 on LiveCodeBench) and demonstrates adaptive output behavior-producing longer responses for harder tasks and shorter ones for simpler tasks. These findings highlight the value of high-quality, verified reasoning traces. We release the AM-Thinking-v1 and Qwen3-235B-A22B distilled datasets to support future research on open and high-performing reasoning-oriented language models. The datasets are publicly available on Hugging FaceDatasets are available on Hugging Face: \href{https://huggingface.co/datasets/a-m-team/AM-Thinking-v1-Distilled{AM-Thinking-v1-Distilled}, https://huggingface.co/datasets/a-m-team/AM-Qwen3-Distilled{AM-Qwen3-Distilled}.}.

Summary

AI-Generated Summary

PDF71May 21, 2025