간결한 추론, 큰 성과: 난이도 인지 프롬프팅을 통한 긴 추론 흔적 정제

초록

기존의 사고 연쇄(CoT) 증류 방법은 기본 모델에 추론 능력을 효과적으로 전달할 수 있지만, 두 가지 주요 한계를 가지고 있습니다: 추론 흔적의 과도한 장황함과 문제 난이도에 대한 부적절한 적응성입니다. 긴 추론 흔적은 추론 비용을 크게 증가시키며, 균일한 길이의 해결책은 기본 모델이 적응형 추론 전략을 학습하는 것을 방해합니다. 이러한 문제를 해결하기 위해, 우리는 성능 손실 없이 추론 흔적을 동적으로 단축하는 난이도 인지 프롬프팅(DAP) 방법을 제안합니다. 우리의 접근 방식에서는, 대형 교사 모델이 먼저 각 문제의 난이도를 판단한 후, 그 추론 흔적을 적절한 더 짧은 길이로 재작성하여 간결하면서도 완전한 추론 흔적을 생성합니다. DAP 파이프라인을 활용하여, 우리는 100K개의 간결한 추론 예제로 구성된 LiteCoT라는 증류 데이터셋을 구축했습니다. 이 데이터셋의 해결책은 평균 720 토큰으로, 일반적인 CoT보다 한 차원 더 짧습니다. LiteCoT를 사용하여, 우리는 Qwen2.5 아키텍처를 기반으로 한 Liter(1.5B, 7B, 32B)라는 새로운 추론 모델 패밀리를 증류했습니다. 실험 결과, 단지 100K개의 이 난이도 조정된 CoT 샘플에 미세 조정된 학생 모델은 800K개의 원본 Long CoT 샘플에 증류된 모델을 능가하면서도 훈련 및 추론 비용을 크게 줄였습니다. 우리의 방법은 또한 잘 일반화됩니다: 11개의 다양한 벤치마크에서, 더 짧은 난이도 인지 CoT는 훨씬 적은 토큰을 사용하면서 Long 연쇄와 동등하거나 더 나은 정확도를 달성했습니다. 예를 들어, 도전적인 AIME24 시험에서, 우리의 접근 방식은 단지 약 5K 추론 토큰만을 사용하여 74.2% Pass@1에 도달했으며, 더 많은 토큰을 소비하는 다른 방법들을 능가했습니다. 우리의 코드와 데이터는 https://github.com/Evanwu1125/LiteCoT에서 확인할 수 있습니다.

English

Existing chain-of-thought (CoT) distillation methods can effectively transfer reasoning abilities to base models but suffer from two major limitations: excessive verbosity of reasoning traces and inadequate adaptability to problem difficulty. Long reasoning traces significantly increase inference costs, and uniform-length solutions prevent base models from learning adaptive reasoning strategies. To address these issues, we propose a difficulty-aware prompting (DAP) method to dynamically shorten reasoning traces without performance loss. In our approach, a large teacher model first judges each problem's difficulty and then rewrites its reasoning traces to an appropriate shorter length, yielding concise yet complete reasoning traces. Leveraging the DAP pipeline, we curate a distilled dataset called LiteCoT consisting of 100K concise reasoning examples, with solutions averaging only 720 tokens (an order of magnitude shorter than typical CoTs). Using LiteCoT, we distilled a new family of reasoning models called Liter (1.5B, 7B, and 32B) based on the Qwen2.5 architecture. Experiments show that a student model fine-tuned on just 100K of these difficulty-pruned CoT samples outperforms a model distilled on 800K original Long CoT samples, while significantly reducing training and inference costs. Our method also generalizes well: across 11 diverse benchmarks, the shorter difficulty-aware CoTs achieve equal or better accuracy than Long chains, using far fewer tokens. For example, on the challenging AIME24 exam, our approach reaches 74.2% Pass@1 using only about 5K inference tokens, surpassing other methods that consume many more tokens. Our code and data are available at https://github.com/Evanwu1125/LiteCoT.

간결한 추론, 큰 성과: 난이도 인지 프롬프팅을 통한 긴 추론 흔적 정제

Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting

초록

Support