QFFT,無需提問的微調技術,用於自適應推理
QFFT, Question-Free Fine-Tuning for Adaptive Reasoning
June 15, 2025
作者: Wanlong Liu, Junxiao Xu, Fei Yu, Yukang Lin, Ke Ji, Wenyu Chen, Yan Xu, Yasheng Wang, Lifeng Shang, Benyou Wang
cs.AI
摘要
近期在長鏈思維(CoT)推理模型上的進展,雖然提升了處理複雜任務的表現,卻也面臨過度思考的問題,尤其是在簡單問題上產生多餘的推理步驟。本文重新審視了長鏈與短鏈CoT模型的推理模式,發現短鏈CoT模式能高效地提供簡潔的推理,而長鏈CoT模式則在短鏈模式難以應對的挑戰性場景中表現出色。為了讓模型能同時利用這兩種模式,我們提出了無問題微調(QFFT),這是一種在訓練過程中移除輸入問題,僅從長鏈CoT回應中學習的微調方法。此方法使模型能自適應地採用兩種推理模式:優先使用短鏈CoT模式,僅在必要時啟動長鏈CoT模式。在多個數學數據集上的實驗表明,QFFT將平均回應長度減少超過50%,同時達到與監督微調(SFT)相當的性能。此外,在噪聲、域外及低資源情境下,QFFT展現出優於SFT的表現。
English
Recent advancements in Long Chain-of-Thought (CoT) reasoning models have
improved performance on complex tasks, but they suffer from overthinking, which
generates redundant reasoning steps, especially for simple questions. This
paper revisits the reasoning patterns of Long and Short CoT models, observing
that the Short CoT patterns offer concise reasoning efficiently, while the Long
CoT patterns excel in challenging scenarios where the Short CoT patterns
struggle. To enable models to leverage both patterns, we propose Question-Free
Fine-Tuning (QFFT), a fine-tuning approach that removes the input question
during training and learns exclusively from Long CoT responses. This approach
enables the model to adaptively employ both reasoning patterns: it prioritizes
the Short CoT patterns and activates the Long CoT patterns only when necessary.
Experiments on various mathematical datasets demonstrate that QFFT reduces
average response length by more than 50\%, while achieving performance
comparable to Supervised Fine-Tuning (SFT). Additionally, QFFT exhibits
superior performance compared to SFT in noisy, out-of-domain, and low-resource
scenarios.