ChatPaper.aiChatPaper

CoRT:思維中的代碼整合推理

CoRT: Code-integrated Reasoning within Thinking

June 11, 2025
作者: Chengpeng Li, Zhengyang Tang, Ziniu Li, Mingfeng Xue, Keqin Bao, Tian Ding, Ruoyu Sun, Benyou Wang, Xiang Wang, Junyang Lin, Dayiheng Liu
cs.AI

摘要

大型推理模型(LRMs),如o1和DeepSeek-R1,在自然语言的长链推理(CoT)方面已展现出显著进展,但在处理复杂数学运算时仍显低效或不准确。通过计算工具(如计算库和符号求解器)来应对这些限制颇具前景,但这引入了一个技术挑战:代码解释器(CI)带来了超出模型内部文本表示的外部知识,因此直接结合并不高效。本文介绍了CoRT,一个用于教导LRMs有效且高效利用CI的后训练框架。作为第一步,我们通过提示工程(Hint-Engineering)合成代码集成的推理数据,以解决数据稀缺问题,该策略在适当位置插入不同提示以优化LRM与CI的交互。我们手动创建了30个高质量样本,并在此基础上对参数规模从1.5B到32B的模型进行了后训练,包括监督微调、拒绝微调和强化学习。实验结果表明,采用提示工程的模型在DeepSeek-R1-Distill-Qwen-32B和DeepSeek-R1-Distill-Qwen-1.5B上分别实现了4%和8%的绝对提升,覆盖了五个具有挑战性的数学推理数据集。此外,与自然语言模型相比,提示工程模型在32B模型上减少了约30%的token使用,在1.5B模型上减少了50%。模型和代码可在https://github.com/ChengpengLi1003/CoRT获取。
English
Large Reasoning Models (LRMs) like o1 and DeepSeek-R1 have shown remarkable progress in natural language reasoning with long chain-of-thought (CoT), yet they remain inefficient or inaccurate when handling complex mathematical operations. Addressing these limitations through computational tools (e.g., computation libraries and symbolic solvers) is promising, but it introduces a technical challenge: Code Interpreter (CI) brings external knowledge beyond the model's internal text representations, thus the direct combination is not efficient. This paper introduces CoRT, a post-training framework for teaching LRMs to leverage CI effectively and efficiently. As a first step, we address the data scarcity issue by synthesizing code-integrated reasoning data through Hint-Engineering, which strategically inserts different hints at appropriate positions to optimize LRM-CI interaction. We manually create 30 high-quality samples, upon which we post-train models ranging from 1.5B to 32B parameters, with supervised fine-tuning, rejection fine-tuning and reinforcement learning. Our experimental results demonstrate that Hint-Engineering models achieve 4\% and 8\% absolute improvements on DeepSeek-R1-Distill-Qwen-32B and DeepSeek-R1-Distill-Qwen-1.5B respectively, across five challenging mathematical reasoning datasets. Furthermore, Hint-Engineering models use about 30\% fewer tokens for the 32B model and 50\% fewer tokens for the 1.5B model compared with the natural language models. The models and code are available at https://github.com/ChengpengLi1003/CoRT.
PDF162June 13, 2025