教導語言模型以程式碼思考
Teaching Language Models to Think in Code
May 11, 2026
作者: Hyeon Hwang, Jiwoo Lee, Jaewoo Kang
cs.AI
摘要
工具整合推理(TIR)已成為語言模型在數學問題求解中的主流典範,其結合自然語言(NL)推理與程式碼執行。然而,這種交錯設定存在三個關鍵限制:程式碼經常僅作為事後驗證器、中間階段的NL計算容易出錯,且NL與程式碼扮演的角色重疊而非明確分工。我們提出ThinC(以程式碼思考),此框架中程式碼本身即為推理者,而非由NL調用的工具。ThinC的軌跡始於簡短的NL規劃步驟,之後所有推理皆透過僅由執行輸出連接的程式碼區塊展開。我們從教師模型蒸餾出12.2k條以程式碼為中心的軌跡,並以監督式微調後接強化學習訓練ThinC-1.7B與ThinC-4B模型。ThinC-4B在五個競賽級數學基準上持續優於所有TIR基線,甚至超越規模大得多的Qwen3-235B-A22B-Thinking。進一步分析顯示,ThinC透過程式碼進行推理:其99.2%的最終答案植基於直譯器輸出,且模型能在無需中間NL推理的情況下,可靠地從程式碼執行失敗中恢復。我們的程式碼與模型將於近期公開。
English
Tool-integrated reasoning (TIR) has emerged as a dominant paradigm for mathematical problem solving in language models, combining natural language (NL) reasoning with code execution. However, this interleaved setup has three key limitations: code often acts as a post-hoc verifier, intermediate NL computations are error-prone, and NL and code play overlapping rather than clearly distinct roles. We propose ThinC (Thinking in Code), a framework in which code itself serves as the reasoner rather than as a tool invoked by NL. A ThinC trajectory begins with a brief NL planning step, after which all reasoning unfolds through code blocks connected only by their execution outputs. We distill 12.2k code-centric trajectories from a teacher model and train ThinC-1.7B and ThinC-4B with supervised fine-tuning followed by reinforcement learning. ThinC-4B consistently outperforms every TIR baseline on five competition-level math benchmarks and even surpasses the much larger Qwen3-235B-A22B-Thinking. Further analysis shows that ThinC reasons through code: 99.2% of its final answers are grounded in interpreter output, and the model recovers reliably from code execution failures without intermediate NL reasoning. Our code and models will be released soon.