コード支援型連鎖思考とモデル推論のための指示のスケーリング

要旨

推論能力は、大規模言語モデル（LLMs）が複雑なタスクを解決する上で極めて重要であるが、信頼性と拡張性のある推論を実現することは依然として課題である。Chain-of-Thought（CoT）プロンプティングは主流のアプローチとなっているが、既存の手法では生成の制御不能、品質の不足、推論経路の多様性の限界といった問題がしばしば見られる。最近の研究では、実行可能なステップに基づいて推論を強化するためにコードを活用しているが、そのような手法は通常、事前に定義された数学的問題に限定されており、拡張性と汎用性が妨げられている。本研究では、Caco（Code-Assisted Chain-of-ThOught）という新しいフレームワークを提案する。これは、コード駆動の拡張を通じて、高品質で検証可能かつ多様な命令-CoT推論データの合成を自動化するものである。従来の研究とは異なり、Cacoはまず、既存の数学およびプログラミングの解法を統一されたコード形式でコードベースのCoT生成器にファインチューニングし、その後、多様な推論トレースを大量に生成する。特に、コード実行とルールベースのフィルタリングによる自動検証を導入し、論理的正確性と構造的多様性を保証した上で、フィルタリングされた出力を自然言語の命令と言語CoTに逆変換し、タスク適応性を高める。この閉ループプロセスにより、実行可能性が保証された推論データの完全自動化かつ拡張可能な合成が可能となる。作成したCaco-1.3Mデータセットを用いた実験では、Cacoでトレーニングされたモデルが数学的推論ベンチマークで強力な競争力を発揮し、既存の強力なベースラインを上回る結果を示した。さらに分析を行った結果、Cacoのコードに基づく検証と命令の多様性が、未見のタスクに対する優れた汎化性能に寄与していることが明らかとなった。本研究は、人間の介入なしに自立した信頼性の高い推論システムを構築するためのパラダイムを確立するものである。

English

Reasoning capability is pivotal for Large Language Models (LLMs) to solve complex tasks, yet achieving reliable and scalable reasoning remains challenging. While Chain-of-Thought (CoT) prompting has become a mainstream approach, existing methods often suffer from uncontrolled generation, insufficient quality, and limited diversity in reasoning paths. Recent efforts leverage code to enhance CoT by grounding reasoning in executable steps, but such methods are typically constrained to predefined mathematical problems, hindering scalability and generalizability. In this work, we propose Caco (Code-Assisted Chain-of-ThOught), a novel framework that automates the synthesis of high-quality, verifiable, and diverse instruction-CoT reasoning data through code-driven augmentation. Unlike prior work, Caco first fine-tunes a code-based CoT generator on existing math and programming solutions in a unified code format, then scales the data generation to a large amount of diverse reasoning traces. Crucially, we introduce automated validation via code execution and rule-based filtering to ensure logical correctness and structural diversity, followed by reverse-engineering filtered outputs into natural language instructions and language CoTs to enrich task adaptability. This closed-loop process enables fully automated, scalable synthesis of reasoning data with guaranteed executability. Experiments on our created Caco-1.3M dataset demonstrate that Caco-trained models achieve strong competitive performance on mathematical reasoning benchmarks, outperforming existing strong baselines. Further analysis reveals that Caco's code-anchored verification and instruction diversity contribute to superior generalization across unseen tasks. Our work establishes a paradigm for building self-sustaining, trustworthy reasoning systems without human intervention.

コード支援型連鎖思考とモデル推論のための指示のスケーリング

Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning

要旨

Support