组合合成：通过原子分解与重组扩展代码RLVR

摘要

基于可验证奖励的强化学习（RLVR）近期已成为塑造大语言模型（LLMs）卓越编码能力的关键技术。然而，RLVR的可扩展性受到严重制约，其根源在于针对模型能力边界附近、具备充分挑战性且可验证的代码任务极度匮乏。现有研究常依赖启发式种子扩展进行数据合成，严重限制了任务的新颖性与难度，导致此类数据的训练价值无法随合成规模同步提升。为此，我们提出原子分解与重组框架（ADR），通过将任务分解为原子元素并进行可控重组来生成可验证的代码任务，从而能够生成真正新颖且具有挑战性的可验证代码任务。实验与分析表明，相较于现有基线方法，ADR在原创性、难度、多样性和测试质量方面均具有显著优势，并在算法编程、工具使用和数据科学等多样化下游领域的RLVR训练中持续带来更显著的代码能力提升。本研究为新型代码任务合成与可扩展RLVR训练开辟了新范式。

English

Reinforcement Learning with Verifiable Rewards (RLVR) has recently emerged as the cornerstone for shaping the remarkable coding abilities of Large Language Models (LLMs). However, the scalability of RLVR is severely constrained by the scarcity of sufficiently challenging verifiable code tasks that target near the model's edge of competence. Prior studies often rely on heuristic seed expansions for data synthesis, which severely limits both novelty and difficulty. Consequently, the training value of such data fails to scale proportionally with the size of its synthesis. To this end, we propose Atomic Decomposition and Recombination (ADR), a novel framework that generates verifiable code tasks via decomposition into atomic elements and controlled recombination, thereby enabling the generation of genuinely novel and challenging verifiable code tasks. Experiments and analysis demonstrate that ADR achieves superior originality, difficulty, diversity, and test quality over existing baselines, and consistently delivers greater improvements in code ability across RLVR in diverse downstream domains, including algorithmic programming, tool usage, and data science. Our work sheds light on a new paradigm for novel code task synthesis and scalable RLVR training.