X-Coder:通过全合成任务、解决方案与测试推进竞技编程发展
X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests
January 11, 2026
作者: Jie Wu, Haoling Li, Xin Zhang, Jiani Guo, Jane Luo, Steven Liu, Yangyu Huang, Ruihang Chu, Scarlett Li, Yujiu Yang
cs.AI
摘要
竞技编程因其密集的推理需求和高逻辑复杂度,对代码大语言模型提出了巨大挑战。然而当前代码大模型仍严重依赖现实数据,这限制了其扩展性。本文探索了一种完全合成的方法:通过使用完全生成的任务、解决方案和测试用例来训练代码大模型,从而在不依赖现实数据的情况下增强代码推理能力。为此,我们基于特征合成技术提出名为SynthSmith的新型数据合成流程。该流程展现出生成多样化高难度任务的能力,同时提供经过验证的解决方案与测试用例,支持监督微调与强化学习两种训练模式。基于所提出的合成式SFT与RL数据集,我们推出了X-Coder模型系列——该7B参数模型在LiveCodeBench v5和v6上分别达到62.9 avg@8和55.8的通过率,显著超越DeepCoder-14B-Preview和AReal-boba2-14B。深入分析表明,缩放定律在我们的合成数据集上依然成立,并探索了哪些维度扩展更具效益。我们进一步通过详尽的消融实验揭示了以代码为中心的强化学习关键要素,并指出性能塑造的核心因素。本研究证明,扩展高质量合成数据并采用分阶段训练能显著推进代码推理能力,同时降低对现实编程数据的依赖。
English
Competitive programming presents great challenges for Code LLMs due to its intensive reasoning demands and high logical complexity. However, current Code LLMs still rely heavily on real-world data, which limits their scalability. In this paper, we explore a fully synthetic approach: training Code LLMs with entirely generated tasks, solutions, and test cases, to empower code reasoning models without relying on real-world data. To support this, we leverage feature-based synthesis to propose a novel data synthesis pipeline called SynthSmith. SynthSmith shows strong potential in producing diverse and challenging tasks, along with verified solutions and tests, supporting both supervised fine-tuning and reinforcement learning. Based on the proposed synthetic SFT and RL datasets, we introduce the X-Coder model series, which achieves a notable pass rate of 62.9 avg@8 on LiveCodeBench v5 and 55.8 on v6, outperforming DeepCoder-14B-Preview and AReal-boba2-14B despite having only 7B parameters. In-depth analysis reveals that scaling laws hold on our synthetic dataset, and we explore which dimensions are more effective to scale. We further provide insights into code-centric reinforcement learning and highlight the key factors that shape performance through detailed ablations and analysis. Our findings demonstrate that scaling high-quality synthetic data and adopting staged training can greatly advance code reasoning, while mitigating reliance on real-world coding data.