ChatPaper.aiChatPaper

CoDiQ:可控難題生成中的測試時縮放技術

CoDiQ: Test-Time Scaling for Controllable Difficult Question Generation

February 2, 2026
作者: Zhongyuan Peng, Caijun Xu, Changyi Xiao, Shibo Hong, Eli Zhang, Stephen Huang, Yixin Cao
cs.AI

摘要

大型推理模型(LRMs)通过训练具有挑战性的竞赛级题目能获得显著收益。然而,现有自动化题目生成方法存在难度控制不精确、计算成本高、难以规模化生成竞赛级题目等问题。本文提出CoDiQ(可控难度题目生成)框架,通过测试时缩放实现细粒度难度控制,同时确保题目可解性。具体而言,我们首先发现测试时缩放趋势(扩展推理的token预算会提升难度但降低可解性),以及定义模型生成有效高难度题目能力上限的内在属性。接着基于Qwen3-8B开发CoDiQ生成器,该模型提升了高难度题目生成的能力上限,特别适合构建挑战性题目。基于CoDiQ框架,我们构建了包含4.4万条竞赛级题目序列的CoDiQ语料库。人工评估表明,这些题目相比LiveCodeBench/AIME显著更具挑战性,且保持超过82%的可解率。使用CoDiQ语料库训练LRMs能显著提升推理性能,验证了扩展可控难度训练题目可增强推理能力。我们开源CoDiQ语料库、CoDiQ生成器及相关实现以支持相关研究。
English
Large Reasoning Models (LRMs) benefit substantially from training on challenging competition-level questions. However, existing automated question synthesis methods lack precise difficulty control, incur high computational costs, and struggle to generate competition-level questions at scale. In this paper, we propose CoDiQ (Controllable Difficult Question Generation), a novel framework enabling fine-grained difficulty control via test-time scaling while ensuring question solvability. Specifically, first, we identify a test-time scaling tendency (extended reasoning token budget boosts difficulty but reduces solvability) and the intrinsic properties defining the upper bound of a model's ability to generate valid, high-difficulty questions. Then, we develop CoDiQ-Generator from Qwen3-8B, which improves the upper bound of difficult question generation, making it particularly well-suited for challenging question construction. Building on the CoDiQ framework, we build CoDiQ-Corpus (44K competition-grade question sequences). Human evaluations show these questions are significantly more challenging than LiveCodeBench/AIME with over 82% solvability. Training LRMs on CoDiQ-Corpus substantially improves reasoning performance, verifying that scaling controlled-difficulty training questions enhances reasoning capabilities. We open-source CoDiQ-Corpus, CoDiQ-Generator, and implementations to support related research.
PDF73February 7, 2026