簡潔推理，大幅提升：透過難度感知提示修剪冗長推理軌跡

摘要

現有的思維鏈（CoT）蒸餾方法能有效將推理能力轉移至基礎模型，但存在兩大侷限：推理軌跡過於冗長以及對問題難度的適應性不足。冗長的推理軌跡顯著增加了推理成本，而統一長度的解決方案則阻礙了基礎模型學習適應性推理策略。為解決這些問題，我們提出了一種難度感知提示（DAP）方法，旨在動態縮短推理軌跡而不損失性能。在我們的方法中，一個大型教師模型首先判斷每個問題的難度，然後將其推理軌跡重寫為適當的較短長度，從而生成簡潔而完整的推理軌跡。利用DAP流程，我們策劃了一個名為LiteCoT的蒸餾數據集，包含10萬個簡潔的推理示例，其解決方案平均僅720個令牌（比典型的CoT短一個數量級）。使用LiteCoT，我們基於Qwen2.5架構蒸餾出了一系列新的推理模型，稱為Liter（1.5B、7B和32B）。實驗表明，僅用10萬個經過難度修剪的CoT樣本微調的學生模型，其表現優於在80萬個原始長CoT樣本上蒸餾的模型，同時顯著降低了訓練和推理成本。我們的方法也具有良好的泛化能力：在11個多樣化的基準測試中，較短的難度感知CoT達到了與長鏈相等或更高的準確率，且使用的令牌數量遠少於後者。例如，在具有挑戰性的AIME24考試中，我們的方法僅使用約5K推理令牌就達到了74.2%的Pass@1，超越了消耗更多令牌的其他方法。我們的代碼和數據可在https://github.com/Evanwu1125/LiteCoT獲取。

English

Existing chain-of-thought (CoT) distillation methods can effectively transfer reasoning abilities to base models but suffer from two major limitations: excessive verbosity of reasoning traces and inadequate adaptability to problem difficulty. Long reasoning traces significantly increase inference costs, and uniform-length solutions prevent base models from learning adaptive reasoning strategies. To address these issues, we propose a difficulty-aware prompting (DAP) method to dynamically shorten reasoning traces without performance loss. In our approach, a large teacher model first judges each problem's difficulty and then rewrites its reasoning traces to an appropriate shorter length, yielding concise yet complete reasoning traces. Leveraging the DAP pipeline, we curate a distilled dataset called LiteCoT consisting of 100K concise reasoning examples, with solutions averaging only 720 tokens (an order of magnitude shorter than typical CoTs). Using LiteCoT, we distilled a new family of reasoning models called Liter (1.5B, 7B, and 32B) based on the Qwen2.5 architecture. Experiments show that a student model fine-tuned on just 100K of these difficulty-pruned CoT samples outperforms a model distilled on 800K original Long CoT samples, while significantly reducing training and inference costs. Our method also generalizes well: across 11 diverse benchmarks, the shorter difficulty-aware CoTs achieve equal or better accuracy than Long chains, using far fewer tokens. For example, on the challenging AIME24 exam, our approach reaches 74.2% Pass@1 using only about 5K inference tokens, surpassing other methods that consume many more tokens. Our code and data are available at https://github.com/Evanwu1125/LiteCoT.

簡潔推理，大幅提升：透過難度感知提示修剪冗長推理軌跡

Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting

摘要

Support