OpenCodeReasoning:推進競技編程中的數據蒸餾技術
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding
April 2, 2025
作者: Wasi Uddin Ahmad, Sean Narenthiran, Somshubra Majumdar, Aleksander Ficek, Siddhartha Jain, Jocelyn Huang, Vahid Noroozi, Boris Ginsburg
cs.AI
摘要
自基於推理的大型語言模型問世以來,許多研究已成功將推理能力蒸餾至學生模型中。此類技術顯著縮小了推理模型與標準大型語言模型在編碼任務上的差距。然而,儘管取得這些進展,關於推理模型蒸餾的許多成果仍受限於專有數據集,或缺乏數據整理、篩選及後續訓練的詳細資訊。為解決此問題,我們構建了一個優質的監督式微調(SFT)數據集,並利用其在各種規模的模型中實現了頂尖的編碼能力。我們的蒸餾模型僅使用SFT,便在LiveCodeBench上達到61.8%的成績,在CodeContests上達到24.6%,超越了使用強化學習訓練的替代方案。我們進一步分析了用於構建數據集的數據來源、代碼執行篩選的影響,以及指令/解決方案多樣性的重要性。我們觀察到,執行篩選對基準準確率產生了負面影響,這促使我們優先考慮指令多樣性而非解決方案的正確性。最後,我們還分析了這些模型所採用的詞元效率與推理模式。我們將向社區開源這些數據集與蒸餾模型。
English
Since the advent of reasoning-based large language models, many have found
great success from distilling reasoning capabilities into student models. Such
techniques have significantly bridged the gap between reasoning and standard
LLMs on coding tasks. Despite this, much of the progress on distilling
reasoning models remains locked behind proprietary datasets or lacks details on
data curation, filtering and subsequent training. To address this, we construct
a superior supervised fine-tuning (SFT) dataset that we use to achieve
state-of-the-art coding capability results in models of various sizes. Our
distilled models use only SFT to achieve 61.8% on LiveCodeBench and 24.6% on
CodeContests, surpassing alternatives trained with reinforcement learning. We
then perform analysis on the data sources used to construct our dataset, the
impact of code execution filtering, and the importance of instruction/solution
diversity. We observe that execution filtering negatively affected benchmark
accuracy, leading us to prioritize instruction diversity over solution
correctness. Finally, we also analyze the token efficiency and reasoning
patterns utilized by these models. We will open-source these datasets and
distilled models to the community.Summary
AI-Generated Summary