QFFT, 적응형 추론을 위한 질문 없는 미세 조정

초록

최근 장기 사고 연쇄(Long Chain-of-Thought, CoT) 추론 모델의 발전으로 복잡한 과제에서의 성능이 향상되었지만, 특히 간단한 질문에 대해 불필요한 추론 단계를 생성하는 과도한 사고(overthinking) 문제가 발생하고 있다. 본 논문은 장기 및 단기 CoT 모델의 추론 패턴을 재검토하며, 단기 CoT 패턴이 간결하고 효율적인 추론을 제공하는 반면, 장기 CoT 패턴은 단기 CoT 패턴이 어려움을 겪는 도전적인 시나리오에서 뛰어난 성능을 보임을 관찰하였다. 두 패턴을 모두 활용할 수 있도록, 본 연구는 입력 질문을 제거하고 장기 CoT 응답만을 학습하는 미세 조정 접근법인 질문 없는 미세 조정(Question-Free Fine-Tuning, QFFT)을 제안한다. 이 접근법은 모델이 두 추론 패턴을 적응적으로 활용할 수 있도록 하며, 단기 CoT 패턴을 우선적으로 사용하고 필요할 때만 장기 CoT 패턴을 활성화한다. 다양한 수학적 데이터셋에서의 실험 결과, QFFT는 평균 응답 길이를 50% 이상 줄이면서도 지도 미세 조정(Supervised Fine-Tuning, SFT)과 비슷한 성능을 달성하였다. 또한, QFFT는 잡음이 있는 환경, 도메인 외 데이터, 그리고 저자원 시나리오에서 SFT보다 우수한 성능을 보였다.

English

Recent advancements in Long Chain-of-Thought (CoT) reasoning models have improved performance on complex tasks, but they suffer from overthinking, which generates redundant reasoning steps, especially for simple questions. This paper revisits the reasoning patterns of Long and Short CoT models, observing that the Short CoT patterns offer concise reasoning efficiently, while the Long CoT patterns excel in challenging scenarios where the Short CoT patterns struggle. To enable models to leverage both patterns, we propose Question-Free Fine-Tuning (QFFT), a fine-tuning approach that removes the input question during training and learns exclusively from Long CoT responses. This approach enables the model to adaptively employ both reasoning patterns: it prioritizes the Short CoT patterns and activates the Long CoT patterns only when necessary. Experiments on various mathematical datasets demonstrate that QFFT reduces average response length by more than 50\%, while achieving performance comparable to Supervised Fine-Tuning (SFT). Additionally, QFFT exhibits superior performance compared to SFT in noisy, out-of-domain, and low-resource scenarios.

QFFT, 적응형 추론을 위한 질문 없는 미세 조정

QFFT, Question-Free Fine-Tuning for Adaptive Reasoning

초록

Support