ChatPaper.aiChatPaper

数据难度扩展:通过在新颖挑战性问题上的强化学习提升编程模型性能

Scaling Data Difficulty: Improving Coding Models via Reinforcement Learning on Fresh and Challenging Problems

March 8, 2026
作者: Zongqian Li, Tengchao Lv, Shaohan Huang, Yixuan Su, Qinzheng Sun, Qiufeng Yin, Ying Xin, Scarlett Li, Lei Cui, Nigel Collier, Furu Wei
cs.AI

摘要

训练新一代代码生成模型需要高质量数据集,但现有数据集存在难度失衡、格式不一致和数据质量问题。我们通过系统性数据处理和难度分级应对这些挑战,提出包含收集、处理、筛选和验证的四阶段数据处理框架,并引入基于大语言模型的自动难度筛选机制——该预测-校准-选择框架利用五维加权难度指标,在保留具有挑战性题目的同时剔除简单题目。最终构建的MicroCoder数据集包含数万道经严格筛选的真实编程竞赛题目,覆盖多平台且注重时效性与难度平衡。在严格未见过的LiveCodeBench上的评估表明,相较于同等规模的常用基线数据集,MicroCoder在300个训练步数内实现3倍性能提升,且在GRPO及其变体训练算法下均保持稳定优势。该数据集在不同规模模型上对中高难度题目表现出显著改进,在模型能力极限测试中实现最高17.2%的相对性能增益。这些结果验证了难度感知的数据策展能提升模型应对复杂任务的能力,为代码生成领域的数据集构建提供了多重启示。
English
Training next-generation code generation models requires high-quality datasets, yet existing datasets face difficulty imbalance, format inconsistency, and data quality problems. We address these challenges through systematic data processing and difficulty scaling. We introduce a four-stage Data Processing Framework encompassing collection, processing, filtering, and verification, incorporating Automatic Difficulty Filtering via an LLM-based predict-calibrate-select framework that leverages multi-dimensional difficulty metrics across five weighted dimensions to retain challenging problems while removing simplistic ones. The resulting MicroCoder dataset comprises tens of thousands of curated real competitive programming problems from diverse platforms, emphasizing recency and difficulty. Evaluations on strictly unseen LiveCodeBench demonstrate that MicroCoder achieves 3x larger performance gains within 300 training steps compared to widely-used baseline datasets of comparable size, with consistent advantages under both GRPO and its variant training algorithms. The MicroCoder dataset delivers obvious improvements on medium and hard problems across different model sizes, achieving up to 17.2% relative gains in overall performance where model capabilities are most stretched. These results validate that difficulty-aware data curation improves model performance on challenging tasks, providing multiple insights for dataset creation in code generation.
PDF52March 16, 2026