ChatPaper.aiChatPaper

TTCS:面向自演化的测试时课程合成

TTCS: Test-Time Curriculum Synthesis for Self-Evolving

January 30, 2026
作者: Chengyi Yang, Zhishang Xiang, Yunbo Tang, Zongpei Teng, Chengsong Huang, Fei Long, Yuhan Liu, Jinsong Su
cs.AI

摘要

测试时训练为提升大语言模型的推理能力提供了一条前景广阔的路径——仅利用测试题目即可实现模型自适应。然而现有方法在处理复杂推理问题时面临双重挑战:原始测试题目往往因难度过高而难以生成高质量伪标签,且有限测试集规模使得持续在线更新易出现不稳定。为突破这些局限,我们提出TTCS——一种协同演化的测试时训练框架。具体而言,TTCS从同一预训练模型初始化两个策略:题目生成器与推理求解器。二者通过迭代优化实现协同进化:生成器基于测试题目生成难度渐进的变体,构建贴合求解器当前能力的结构化课程;求解器则通过在原始测试题与合成题上采样多组回答并计算自洽奖励来实现自我更新。关键机制在于:求解器的反馈会引导生成器产出与模型当前能力匹配的题目,而生成的题目变体又反过来稳定求解器的测试时训练。实验表明,TTCS能持续增强模型在复杂数学基准上的推理能力,并可迁移至不同大语言模型骨干的通用领域任务,为动态构建自进化的测试时课程开辟了可扩展路径。代码与实现细节详见https://github.com/XMUDeepLIT/TTCS。
English
Test-Time Training offers a promising way to improve the reasoning ability of large language models (LLMs) by adapting the model using only the test questions. However, existing methods struggle with difficult reasoning problems for two reasons: raw test questions are often too difficult to yield high-quality pseudo-labels, and the limited size of test sets makes continuous online updates prone to instability. To address these limitations, we propose TTCS, a co-evolving test-time training framework. Specifically, TTCS initializes two policies from the same pretrained model: a question synthesizer and a reasoning solver. These policies evolve through iterative optimization: the synthesizer generates progressively challenging question variants conditioned on the test questions, creating a structured curriculum tailored to the solver's current capability, while the solver updates itself using self-consistency rewards computed from multiple sampled responses on both original test and synthetic questions. Crucially, the solver's feedback guides the synthesizer to generate questions aligned with the model's current capability, and the generated question variants in turn stabilize the solver's test-time training. Experiments show that TTCS consistently strengthens the reasoning ability on challenging mathematical benchmarks and transfers to general-domain tasks across different LLM backbones, highlighting a scalable path towards dynamically constructing test-time curricula for self-evolving. Our code and implementation details are available at https://github.com/XMUDeepLIT/TTCS.
PDF283February 3, 2026