QFFT：面向自适应推理的无提问微调

摘要

近期，长链思维推理模型（Long Chain-of-Thought, CoT）在复杂任务上的性能取得了显著提升，但这些模型存在过度思考的问题，尤其是在处理简单问题时，会产生冗余的推理步骤。本文重新审视了长链与短链CoT模型的推理模式，发现短链CoT模式能够高效地提供简洁的推理，而长链CoT模式则在短链CoT模式难以应对的复杂场景中表现优异。为了让模型能够灵活运用这两种推理模式，我们提出了无问题微调（Question-Free Fine-Tuning, QFFT）方法，该方法在训练过程中移除输入问题，仅从长链CoT响应中学习。通过这种方式，模型能够自适应地采用两种推理模式：优先使用短链CoT模式，仅在必要时激活长链CoT模式。在多个数学数据集上的实验表明，QFFT将平均响应长度减少了50%以上，同时达到了与监督微调（Supervised Fine-Tuning, SFT）相当的性能。此外，在噪声、跨领域和低资源场景下，QFFT相较于SFT展现出更优越的性能。

English

Recent advancements in Long Chain-of-Thought (CoT) reasoning models have improved performance on complex tasks, but they suffer from overthinking, which generates redundant reasoning steps, especially for simple questions. This paper revisits the reasoning patterns of Long and Short CoT models, observing that the Short CoT patterns offer concise reasoning efficiently, while the Long CoT patterns excel in challenging scenarios where the Short CoT patterns struggle. To enable models to leverage both patterns, we propose Question-Free Fine-Tuning (QFFT), a fine-tuning approach that removes the input question during training and learns exclusively from Long CoT responses. This approach enables the model to adaptively employ both reasoning patterns: it prioritizes the Short CoT patterns and activates the Long CoT patterns only when necessary. Experiments on various mathematical datasets demonstrate that QFFT reduces average response length by more than 50\%, while achieving performance comparable to Supervised Fine-Tuning (SFT). Additionally, QFFT exhibits superior performance compared to SFT in noisy, out-of-domain, and low-resource scenarios.

QFFT：面向自适应推理的无提问微调

QFFT, Question-Free Fine-Tuning for Adaptive Reasoning

摘要

Support