CoRT：思維中的代碼整合推理

摘要

大型推理模型（LRMs），如o1和DeepSeek-R1，在自然语言的长链推理（CoT）方面已展现出显著进展，但在处理复杂数学运算时仍显低效或不准确。通过计算工具（如计算库和符号求解器）来应对这些限制颇具前景，但这引入了一个技术挑战：代码解释器（CI）带来了超出模型内部文本表示的外部知识，因此直接结合并不高效。本文介绍了CoRT，一个用于教导LRMs有效且高效利用CI的后训练框架。作为第一步，我们通过提示工程（Hint-Engineering）合成代码集成的推理数据，以解决数据稀缺问题，该策略在适当位置插入不同提示以优化LRM与CI的交互。我们手动创建了30个高质量样本，并在此基础上对参数规模从1.5B到32B的模型进行了后训练，包括监督微调、拒绝微调和强化学习。实验结果表明，采用提示工程的模型在DeepSeek-R1-Distill-Qwen-32B和DeepSeek-R1-Distill-Qwen-1.5B上分别实现了4%和8%的绝对提升，覆盖了五个具有挑战性的数学推理数据集。此外，与自然语言模型相比，提示工程模型在32B模型上减少了约30%的token使用，在1.5B模型上减少了50%。模型和代码可在https://github.com/ChengpengLi1003/CoRT获取。

English

Large Reasoning Models (LRMs) like o1 and DeepSeek-R1 have shown remarkable progress in natural language reasoning with long chain-of-thought (CoT), yet they remain inefficient or inaccurate when handling complex mathematical operations. Addressing these limitations through computational tools (e.g., computation libraries and symbolic solvers) is promising, but it introduces a technical challenge: Code Interpreter (CI) brings external knowledge beyond the model's internal text representations, thus the direct combination is not efficient. This paper introduces CoRT, a post-training framework for teaching LRMs to leverage CI effectively and efficiently. As a first step, we address the data scarcity issue by synthesizing code-integrated reasoning data through Hint-Engineering, which strategically inserts different hints at appropriate positions to optimize LRM-CI interaction. We manually create 30 high-quality samples, upon which we post-train models ranging from 1.5B to 32B parameters, with supervised fine-tuning, rejection fine-tuning and reinforcement learning. Our experimental results demonstrate that Hint-Engineering models achieve 4\% and 8\% absolute improvements on DeepSeek-R1-Distill-Qwen-32B and DeepSeek-R1-Distill-Qwen-1.5B respectively, across five challenging mathematical reasoning datasets. Furthermore, Hint-Engineering models use about 30\% fewer tokens for the 32B model and 50\% fewer tokens for the 1.5B model compared with the natural language models. The models and code are available at https://github.com/ChengpengLi1003/CoRT.

CoRT：思維中的代碼整合推理

CoRT: Code-integrated Reasoning within Thinking

摘要

Support