データ難易度のスケーリング：新規かつ困難な問題に対する強化学習によるコーディングモデルの改善

要旨

次世代コード生成モデルの訓練には高品質なデータセットが不可欠であるが、既存のデータセットは難易度の不均衡、形式の不一致、データ品質の問題に直面している。我々はこれらの課題を、体系的なデータ処理と難易度スケーリングによって解決する。収集、処理、フィルタリング、検証の4段階からなるデータ処理フレームワークを導入し、LLMベースの予測-較正-選択フレームワークによる自動難易度フィルタリングを組み込む。この手法は5つの重み付け次元にわたる多次元難易度指標を活用し、単純な問題を除去しながら挑戦的な問題を保持する。その結果得られたMicroCoderデータセットは、多様なプラットフォームから収集された数万件の精選された実競技プログラミング問題で構成され、新規性と難易度を重視している。厳密に未見のLiveCodeBenchによる評価では、MicroCoderが同等規模の広く使用されているベースラインデータセットと比較して、300訓練ステップ以内で3倍大きな性能向上を達成し、GRPO及びその変種訓練アルゴリズムの両方で一貫した優位性を示した。MicroCoderデータセットは、様々なモデルサイズにおいて中程度及び困難な問題で明らかな改善をもたらし、モデル能力が最大限に引き出される場面では総合性能で最大17.2%の相対的向上を達成した。これらの結果は、難易度を考慮したデータ選定が困難な課題におけるモデル性能を向上させることを実証し、コード生成におけるデータセット作成に複数の示唆を提供する。

English

Training next-generation code generation models requires high-quality datasets, yet existing datasets face difficulty imbalance, format inconsistency, and data quality problems. We address these challenges through systematic data processing and difficulty scaling. We introduce a four-stage Data Processing Framework encompassing collection, processing, filtering, and verification, incorporating Automatic Difficulty Filtering via an LLM-based predict-calibrate-select framework that leverages multi-dimensional difficulty metrics across five weighted dimensions to retain challenging problems while removing simplistic ones. The resulting MicroCoder dataset comprises tens of thousands of curated real competitive programming problems from diverse platforms, emphasizing recency and difficulty. Evaluations on strictly unseen LiveCodeBench demonstrate that MicroCoder achieves 3x larger performance gains within 300 training steps compared to widely-used baseline datasets of comparable size, with consistent advantages under both GRPO and its variant training algorithms. The MicroCoder dataset delivers obvious improvements on medium and hard problems across different model sizes, achieving up to 17.2% relative gains in overall performance where model capabilities are most stretched. These results validate that difficulty-aware data curation improves model performance on challenging tasks, providing multiple insights for dataset creation in code generation.

データ難易度のスケーリング：新規かつ困難な問題に対する強化学習によるコーディングモデルの改善

Scaling Data Difficulty: Improving Coding Models via Reinforcement Learning on Fresh and Challenging Problems

要旨

Support