QFFT，無需提問的微調技術，用於自適應推理

摘要

近期在長鏈思維（CoT）推理模型上的進展，雖然提升了處理複雜任務的表現，卻也面臨過度思考的問題，尤其是在簡單問題上產生多餘的推理步驟。本文重新審視了長鏈與短鏈CoT模型的推理模式，發現短鏈CoT模式能高效地提供簡潔的推理，而長鏈CoT模式則在短鏈模式難以應對的挑戰性場景中表現出色。為了讓模型能同時利用這兩種模式，我們提出了無問題微調（QFFT），這是一種在訓練過程中移除輸入問題，僅從長鏈CoT回應中學習的微調方法。此方法使模型能自適應地採用兩種推理模式：優先使用短鏈CoT模式，僅在必要時啟動長鏈CoT模式。在多個數學數據集上的實驗表明，QFFT將平均回應長度減少超過50%，同時達到與監督微調（SFT）相當的性能。此外，在噪聲、域外及低資源情境下，QFFT展現出優於SFT的表現。

English

Recent advancements in Long Chain-of-Thought (CoT) reasoning models have improved performance on complex tasks, but they suffer from overthinking, which generates redundant reasoning steps, especially for simple questions. This paper revisits the reasoning patterns of Long and Short CoT models, observing that the Short CoT patterns offer concise reasoning efficiently, while the Long CoT patterns excel in challenging scenarios where the Short CoT patterns struggle. To enable models to leverage both patterns, we propose Question-Free Fine-Tuning (QFFT), a fine-tuning approach that removes the input question during training and learns exclusively from Long CoT responses. This approach enables the model to adaptively employ both reasoning patterns: it prioritizes the Short CoT patterns and activates the Long CoT patterns only when necessary. Experiments on various mathematical datasets demonstrate that QFFT reduces average response length by more than 50\%, while achieving performance comparable to Supervised Fine-Tuning (SFT). Additionally, QFFT exhibits superior performance compared to SFT in noisy, out-of-domain, and low-resource scenarios.

QFFT，無需提問的微調技術，用於自適應推理

QFFT, Question-Free Fine-Tuning for Adaptive Reasoning

摘要

Support