QFFT、適応的推論のための質問不要なファインチューニング

要旨

近年のLong Chain-of-Thought（CoT）推論モデルの進展により、複雑なタスクにおける性能が向上しているが、特に単純な質問に対して冗長な推論ステップを生成する「過剰思考」の問題が生じている。本論文では、Long CoTモデルとShort CoTモデルの推論パターンを再検討し、Short CoTパターンが簡潔な推論を効率的に提供する一方で、Long CoTパターンはShort CoTパターンが苦手とする困難なシナリオで優れていることを観察した。両方の推論パターンを活用するために、我々はQuestion-Free Fine-Tuning（QFFT）を提案する。これは、訓練中に入力質問を除去し、Long CoT応答のみから学習するファインチューニング手法である。このアプローチにより、モデルはShort CoTパターンを優先し、必要に応じてLong CoTパターンを適応的に使用することが可能となる。様々な数学的データセットでの実験により、QFFTは平均応答長を50％以上削減しつつ、Supervised Fine-Tuning（SFT）と同等の性能を達成することが示された。さらに、QFFTはノイズの多い状況、ドメイン外の状況、および低リソースのシナリオにおいて、SFTよりも優れた性能を示すことが確認された。

English

Recent advancements in Long Chain-of-Thought (CoT) reasoning models have improved performance on complex tasks, but they suffer from overthinking, which generates redundant reasoning steps, especially for simple questions. This paper revisits the reasoning patterns of Long and Short CoT models, observing that the Short CoT patterns offer concise reasoning efficiently, while the Long CoT patterns excel in challenging scenarios where the Short CoT patterns struggle. To enable models to leverage both patterns, we propose Question-Free Fine-Tuning (QFFT), a fine-tuning approach that removes the input question during training and learns exclusively from Long CoT responses. This approach enables the model to adaptively employ both reasoning patterns: it prioritizes the Short CoT patterns and activates the Long CoT patterns only when necessary. Experiments on various mathematical datasets demonstrate that QFFT reduces average response length by more than 50\%, while achieving performance comparable to Supervised Fine-Tuning (SFT). Additionally, QFFT exhibits superior performance compared to SFT in noisy, out-of-domain, and low-resource scenarios.

QFFT、適応的推論のための質問不要なファインチューニング

QFFT, Question-Free Fine-Tuning for Adaptive Reasoning

要旨

Support