CoRT: 사고 내 코드 통합 추론

초록

o1 및 DeepSeek-R1과 같은 대규모 추론 모델(LRMs)은 긴 사고의 연쇄(CoT)를 통한 자연어 추론에서 놀라운 진전을 보여왔지만, 복잡한 수학 연산을 처리할 때는 여전히 비효율적이거나 부정확한 모습을 보입니다. 이러한 한계를 계산 도구(예: 계산 라이브러리 및 기호 해결기)를 통해 해결하는 것은 유망하지만, 기술적 도전을 야기합니다: 코드 인터프리터(CI)는 모델의 내부 텍스트 표현을 넘어서는 외부 지식을 가져오기 때문에, 이를 직접 결합하는 것은 효율적이지 않습니다. 본 논문은 LRMs가 CI를 효과적이고 효율적으로 활용하도록 가르치기 위한 사후 훈련 프레임워크인 CoRT를 소개합니다. 첫 번째 단계로, 우리는 Hint-Engineering을 통해 코드 통합 추론 데이터를 합성하여 데이터 부족 문제를 해결합니다. 이는 전략적으로 적절한 위치에 다양한 힌트를 삽입하여 LRM-CI 상호작용을 최적화합니다. 우리는 30개의 고품질 샘플을 수동으로 생성하고, 이를 기반으로 1.5B에서 32B 파라미터 범위의 모델을 지도 미세 조정, 거부 미세 조정 및 강화 학습을 통해 사후 훈련합니다. 우리의 실험 결과는 Hint-Engineering 모델이 DeepSeek-R1-Distill-Qwen-32B 및 DeepSeek-R1-Distill-Qwen-1.5B에서 각각 5개의 도전적인 수학 추론 데이터셋에 대해 4% 및 8%의 절대적 개선을 달성함을 보여줍니다. 또한, Hint-Engineering 모델은 자연어 모델에 비해 32B 모델의 경우 약 30%, 1.5B 모델의 경우 50% 더 적은 토큰을 사용합니다. 모델과 코드는 https://github.com/ChengpengLi1003/CoRT에서 확인할 수 있습니다.

English

Large Reasoning Models (LRMs) like o1 and DeepSeek-R1 have shown remarkable progress in natural language reasoning with long chain-of-thought (CoT), yet they remain inefficient or inaccurate when handling complex mathematical operations. Addressing these limitations through computational tools (e.g., computation libraries and symbolic solvers) is promising, but it introduces a technical challenge: Code Interpreter (CI) brings external knowledge beyond the model's internal text representations, thus the direct combination is not efficient. This paper introduces CoRT, a post-training framework for teaching LRMs to leverage CI effectively and efficiently. As a first step, we address the data scarcity issue by synthesizing code-integrated reasoning data through Hint-Engineering, which strategically inserts different hints at appropriate positions to optimize LRM-CI interaction. We manually create 30 high-quality samples, upon which we post-train models ranging from 1.5B to 32B parameters, with supervised fine-tuning, rejection fine-tuning and reinforcement learning. Our experimental results demonstrate that Hint-Engineering models achieve 4\% and 8\% absolute improvements on DeepSeek-R1-Distill-Qwen-32B and DeepSeek-R1-Distill-Qwen-1.5B respectively, across five challenging mathematical reasoning datasets. Furthermore, Hint-Engineering models use about 30\% fewer tokens for the 32B model and 50\% fewer tokens for the 1.5B model compared with the natural language models. The models and code are available at https://github.com/ChengpengLi1003/CoRT.

CoRT: 사고 내 코드 통합 추론

CoRT: Code-integrated Reasoning within Thinking

초록

Support