OpenCodeReasoning: 경쟁 프로그래밍을 위한 데이터 정제 기술의 발전

초록

추론 기반 대형 언어 모델의 등장 이후, 많은 연구자들이 추론 능력을 학생 모델로 증류하는 데서 큰 성공을 거두었습니다. 이러한 기술은 코딩 작업에서 추론과 표준 LLM 간의 격차를 크게 줄였습니다. 그럼에도 불구하고, 추론 모델 증류에 관한 많은 진전은 독점 데이터셋 뒤에 잠겨 있거나 데이터 큐레이션, 필터링 및 후속 훈련에 대한 세부 사항이 부족한 상태입니다. 이를 해결하기 위해, 우리는 다양한 크기의 모델에서 최첨단 코딩 능력 결과를 달성하기 위해 사용할 우수한 지도 미세 조정(SFT) 데이터셋을 구축했습니다. 우리의 증류된 모델은 SFT만을 사용하여 LiveCodeBench에서 61.8%, CodeContests에서 24.6%를 달성하며, 강화 학습으로 훈련된 대안들을 능가했습니다. 그런 다음, 우리 데이터셋 구축에 사용된 데이터 소스, 코드 실행 필터링의 영향, 그리고 명령어/해결책 다양성의 중요성을 분석했습니다. 우리는 실행 필터링이 벤치마크 정확도에 부정적인 영향을 미쳤음을 관찰하여, 해결책의 정확성보다 명령어 다양성을 우선시하기로 결정했습니다. 마지막으로, 이러한 모델이 활용하는 토큰 효율성과 추론 패턴도 분석했습니다. 우리는 이러한 데이터셋과 증류된 모델을 커뮤니티에 오픈소스로 공개할 예정입니다.

English

Since the advent of reasoning-based large language models, many have found great success from distilling reasoning capabilities into student models. Such techniques have significantly bridged the gap between reasoning and standard LLMs on coding tasks. Despite this, much of the progress on distilling reasoning models remains locked behind proprietary datasets or lacks details on data curation, filtering and subsequent training. To address this, we construct a superior supervised fine-tuning (SFT) dataset that we use to achieve state-of-the-art coding capability results in models of various sizes. Our distilled models use only SFT to achieve 61.8% on LiveCodeBench and 24.6% on CodeContests, surpassing alternatives trained with reinforcement learning. We then perform analysis on the data sources used to construct our dataset, the impact of code execution filtering, and the importance of instruction/solution diversity. We observe that execution filtering negatively affected benchmark accuracy, leading us to prioritize instruction diversity over solution correctness. Finally, we also analyze the token efficiency and reasoning patterns utilized by these models. We will open-source these datasets and distilled models to the community.

OpenCodeReasoning: 경쟁 프로그래밍을 위한 데이터 정제 기술의 발전

OpenCodeReasoning: Advancing Data Distillation for Competitive Coding

초록

Support