OpenCodeReasoning：推進競技編程中的數據蒸餾技術

摘要

自基於推理的大型語言模型問世以來，許多研究已成功將推理能力蒸餾至學生模型中。此類技術顯著縮小了推理模型與標準大型語言模型在編碼任務上的差距。然而，儘管取得這些進展，關於推理模型蒸餾的許多成果仍受限於專有數據集，或缺乏數據整理、篩選及後續訓練的詳細資訊。為解決此問題，我們構建了一個優質的監督式微調（SFT）數據集，並利用其在各種規模的模型中實現了頂尖的編碼能力。我們的蒸餾模型僅使用SFT，便在LiveCodeBench上達到61.8%的成績，在CodeContests上達到24.6%，超越了使用強化學習訓練的替代方案。我們進一步分析了用於構建數據集的數據來源、代碼執行篩選的影響，以及指令/解決方案多樣性的重要性。我們觀察到，執行篩選對基準準確率產生了負面影響，這促使我們優先考慮指令多樣性而非解決方案的正確性。最後，我們還分析了這些模型所採用的詞元效率與推理模式。我們將向社區開源這些數據集與蒸餾模型。

English

Since the advent of reasoning-based large language models, many have found great success from distilling reasoning capabilities into student models. Such techniques have significantly bridged the gap between reasoning and standard LLMs on coding tasks. Despite this, much of the progress on distilling reasoning models remains locked behind proprietary datasets or lacks details on data curation, filtering and subsequent training. To address this, we construct a superior supervised fine-tuning (SFT) dataset that we use to achieve state-of-the-art coding capability results in models of various sizes. Our distilled models use only SFT to achieve 61.8% on LiveCodeBench and 24.6% on CodeContests, surpassing alternatives trained with reinforcement learning. We then perform analysis on the data sources used to construct our dataset, the impact of code execution filtering, and the importance of instruction/solution diversity. We observe that execution filtering negatively affected benchmark accuracy, leading us to prioritize instruction diversity over solution correctness. Finally, we also analyze the token efficiency and reasoning patterns utilized by these models. We will open-source these datasets and distilled models to the community.

OpenCodeReasoning：推進競技編程中的數據蒸餾技術

OpenCodeReasoning: Advancing Data Distillation for Competitive Coding

摘要

Support