X-Coder:透過全合成任務、解決方案與測試推進競技程式設計
X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests
January 11, 2026
作者: Jie Wu, Haoling Li, Xin Zhang, Jiani Guo, Jane Luo, Steven Liu, Yangyu Huang, Ruihang Chu, Scarlett Li, Yujiu Yang
cs.AI
摘要
由於競技程式設計對推理能力要求極高且邏輯複雜性強,其對程式碼大語言模型構成了重大挑戰。然而,現有程式碼大語言模型仍嚴重依賴現實世界數據,這限制了其可擴展性。本文探索一種完全合成的方法:通過使用完全生成的任務、解決方案和測試用例來訓練程式碼大語言模型,從而在不依賴現實數據的前提下增強程式碼推理能力。為實現此目標,我們基於特徵合成技術提出名為SynthSmith的新型數據合成流程。該流程展現出生成多樣化高難度任務的強大潛力,並能提供經過驗證的解決方案與測試用例,同時支持監督式微調和強化學習。基於所合成的SFT與RL數據集,我們推出X-Coder模型系列——該系列僅憑70億參數即在LiveCodeBench v5上取得62.9 avg@8的優異通過率,在v6版本達55.8分,性能超越DeepCoder-14B-Preview與AReal-boba2-14B等模型。深度分析表明,我們的合成數據集遵循尺度定律,並探討了哪些維度的擴展更具實效。我們進一步透過詳盡的消融實驗與分析,揭示以程式碼為核心的強化學習機制,並明確影響性能的關鍵因素。本研究證實:擴展高質量合成數據與採用分階段訓練能顯著推進程式碼推理能力發展,同時降低對現實世界程式碼數據的依賴。
English
Competitive programming presents great challenges for Code LLMs due to its intensive reasoning demands and high logical complexity. However, current Code LLMs still rely heavily on real-world data, which limits their scalability. In this paper, we explore a fully synthetic approach: training Code LLMs with entirely generated tasks, solutions, and test cases, to empower code reasoning models without relying on real-world data. To support this, we leverage feature-based synthesis to propose a novel data synthesis pipeline called SynthSmith. SynthSmith shows strong potential in producing diverse and challenging tasks, along with verified solutions and tests, supporting both supervised fine-tuning and reinforcement learning. Based on the proposed synthetic SFT and RL datasets, we introduce the X-Coder model series, which achieves a notable pass rate of 62.9 avg@8 on LiveCodeBench v5 and 55.8 on v6, outperforming DeepCoder-14B-Preview and AReal-boba2-14B despite having only 7B parameters. In-depth analysis reveals that scaling laws hold on our synthetic dataset, and we explore which dimensions are more effective to scale. We further provide insights into code-centric reinforcement learning and highlight the key factors that shape performance through detailed ablations and analysis. Our findings demonstrate that scaling high-quality synthetic data and adopting staged training can greatly advance code reasoning, while mitigating reliance on real-world coding data.