TTCS: 自己進化のためのテスト時カリキュラム合成

要旨

テストタイムトレーニングは、テスト問題のみを用いてモデルを適応させることで、大規模言語モデル（LLM）の推論能力を向上させる有望な手法です。しかし、既存の手法は困難な推論問題に対して二つの理由で苦戦しています。第一に、生のテスト問題は難しすぎて高品質な擬似ラベルを生成できず、第二に、テストセットの限られたサイズが連続的なオンライン更新の不安定性を招くためです。これらの課題を解決するため、我々は共進化型テストタイムトレーニングフレームワークであるTTCSを提案します。具体的には、TTCSは同一の事前学習モデルから二つのポリシーを初期化します。質問合成器と推論ソルバーです。これらのポリシーは反復最適化を通じて進化します。合成器はテスト問題を条件として次第に難易度が上がる問題バリアントを生成し、ソルバーの現在の能力に合わせた構造化カリキュラムを構築します。一方、ソルバーは元のテスト問題と合成問題の両方に対して複数の応答をサンプリングし、自己一貫性報酬に基づいて自己更新します。重要な点は、ソルバーのフィードバックが合成器を導き、モデルの現在の能力に沿った問題生成を実現することです。また、生成された問題バリアントはソルバーのテストタイムトレーニングを安定化させます。実験結果から、TTCSが困難な数学的ベンチマークにおける推論能力を一貫して強化し、異なるLLMバックボーンに跨る一般領域タスクへ転移可能であることが示され、自己進化のためのテストタイムカリキュラムを動的に構築するスケーラブルな道筋が明らかになりました。実装コードと詳細はhttps://github.com/XMUDeepLIT/TTCSで公開しています。

English

Test-Time Training offers a promising way to improve the reasoning ability of large language models (LLMs) by adapting the model using only the test questions. However, existing methods struggle with difficult reasoning problems for two reasons: raw test questions are often too difficult to yield high-quality pseudo-labels, and the limited size of test sets makes continuous online updates prone to instability. To address these limitations, we propose TTCS, a co-evolving test-time training framework. Specifically, TTCS initializes two policies from the same pretrained model: a question synthesizer and a reasoning solver. These policies evolve through iterative optimization: the synthesizer generates progressively challenging question variants conditioned on the test questions, creating a structured curriculum tailored to the solver's current capability, while the solver updates itself using self-consistency rewards computed from multiple sampled responses on both original test and synthetic questions. Crucially, the solver's feedback guides the synthesizer to generate questions aligned with the model's current capability, and the generated question variants in turn stabilize the solver's test-time training. Experiments show that TTCS consistently strengthens the reasoning ability on challenging mathematical benchmarks and transfers to general-domain tasks across different LLM backbones, highlighting a scalable path towards dynamically constructing test-time curricula for self-evolving. Our code and implementation details are available at https://github.com/XMUDeepLIT/TTCS.

TTCS: 自己進化のためのテスト時カリキュラム合成

TTCS: Test-Time Curriculum Synthesis for Self-Evolving

要旨

Support