ChatPaper.aiChatPaper

CoT-Valve:可壓縮長度的思維鏈調整

CoT-Valve: Length-Compressible Chain-of-Thought Tuning

February 13, 2025
作者: Xinyin Ma, Guangnian Wan, Runpeng Yu, Gongfan Fang, Xinchao Wang
cs.AI

摘要

Chain-of-Thought 顯著增強了模型的推理能力,但也隨之帶來了推理成本的大幅增加,這是由於長鏈所致。通過觀察到在簡單任務下推理路徑可以輕鬆壓縮,但在困難任務下則困難重重,我們探索了僅使用一個模型彈性地控制推理路徑長度的可行性,從而根據任務難度動態地減少推理模型的推理開銷。我們提出了一種名為 CoT-Valve 的新調整和推理策略,旨在允許模型生成不同長度的推理鏈。為實現此目的,我們提出了識別參數空間中一個方向的方法,通過操縱該方向,可以有效地控制生成的 CoT 的長度。此外,我們展示了這種特性對於壓縮推理鏈是有價值的。我們構建了包含從長到短鏈的相同問題的數據集,並探索了兩種增強策略用於 CoT-Valve:(1)一種精確的長度可壓縮 CoT 調整方法,和(2)一種漸進的鏈長度壓縮方法。我們的實驗表明,CoT-Valve 成功實現了鏈的可控性和可壓縮性,並且表現優於基於提示的控制。我們將此方法應用於 QwQ-32B-Preview,將 GSM8K 上的推理鏈從 741 個縮減至 225 個標記,僅略微降低性能(從 95.07% 至 94.92%),並將 AIME 上的推理鏈從 6827 個縮減至 4629 個標記,僅多出一個錯誤答案。
English
Chain-of-Thought significantly enhances a model's reasoning capability, but it also comes with a considerable increase in inference costs due to long chains. With the observation that the reasoning path can be easily compressed under easy tasks but struggle on hard tasks, we explore the feasibility of elastically controlling the length of reasoning paths with only one model, thereby reducing the inference overhead of reasoning models dynamically based on task difficulty. We introduce a new tuning and inference strategy named CoT-Valve, designed to allow models to generate reasoning chains of varying lengths. To achieve this, we propose to identify a direction in the parameter space that, when manipulated, can effectively control the length of generated CoT. Moreover, we show that this property is valuable for compressing the reasoning chain. We construct datasets with chains from long to short for the same questions and explore two enhanced strategies for CoT-Valve: (1) a precise length-compressible CoT tuning method, and (2) a progressive chain length compression approach. Our experiments show that CoT-Valve successfully enables controllability and compressibility of the chain and shows better performance than the prompt-based control. We applied this method to QwQ-32B-Preview, reducing reasoning chains on GSM8K from 741 to 225 tokens with a minor performance drop (95.07% to 94.92%) and on AIME from 6827 to 4629 tokens, with only one additional incorrect answer.

Summary

AI-Generated Summary

PDF142February 14, 2025