과도한 사고를 멈추라: 대규모 언어 모델을 위한 효율적 추론에 관한 연구

초록

대형 언어 모델(LLMs)은 복잡한 작업에서 놀라운 능력을 보여주고 있습니다. 최근 OpenAI o1과 DeepSeek-R1과 같은 대형 추론 모델(LRMs)의 발전은 지도 미세 조정(SFT)과 강화 학습(RL) 기법을 활용하여 수학 및 프로그래밍과 같은 System-2 추론 영역에서의 성능을 더욱 향상시켰습니다. 그러나 더 긴 사고 연쇄(CoT) 추론 시퀀스는 성능을 개선시키는 동시에, 장황하고 중복된 출력으로 인해 상당한 계산 오버헤드를 초래하는 "과도 사고 현상(overthinking phenomenon)"을 야기합니다. 본 논문에서는 LLMs에서 효율적인 추론을 달성하기 위한 현재의 진척을 체계적으로 조사하고 탐구하는 첫 번째 구조화된 조사를 제공합니다. 전반적으로, LLMs의 내재적 메커니즘에 의존하여 기존 연구를 몇 가지 주요 방향으로 분류합니다: (1) 모델 기반 효율적 추론, 이는 전체 길이 추론 모델을 더 간결한 추론 모델로 최적화하거나 직접 효율적 추론 모델을 훈련하는 것을 고려합니다; (2) 추론 출력 기반 효율적 추론, 이는 추론 단계와 길이를 동적으로 줄이는 것을 목표로 합니다; (3) 입력 프롬프트 기반 효율적 추론, 이는 입력 프롬프트의 난이도나 길이 제어와 같은 속성을 기반으로 추론 효율성을 향상시키려고 합니다. 또한, 추론 모델 훈련을 위한 효율적 데이터 사용을 소개하고, 소형 언어 모델의 추론 능력을 탐구하며, 평가 방법과 벤치마킹에 대해 논의합니다.

English

Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks. Recent advancements in Large Reasoning Models (LRMs), such as OpenAI o1 and DeepSeek-R1, have further improved performance in System-2 reasoning domains like mathematics and programming by harnessing supervised fine-tuning (SFT) and reinforcement learning (RL) techniques to enhance the Chain-of-Thought (CoT) reasoning. However, while longer CoT reasoning sequences improve performance, they also introduce significant computational overhead due to verbose and redundant outputs, known as the "overthinking phenomenon". In this paper, we provide the first structured survey to systematically investigate and explore the current progress toward achieving efficient reasoning in LLMs. Overall, relying on the inherent mechanism of LLMs, we categorize existing works into several key directions: (1) model-based efficient reasoning, which considers optimizing full-length reasoning models into more concise reasoning models or directly training efficient reasoning models; (2) reasoning output-based efficient reasoning, which aims to dynamically reduce reasoning steps and length during inference; (3) input prompts-based efficient reasoning, which seeks to enhance reasoning efficiency based on input prompt properties such as difficulty or length control. Additionally, we introduce the use of efficient data for training reasoning models, explore the reasoning capabilities of small language models, and discuss evaluation methods and benchmarking.

과도한 사고를 멈추라: 대규모 언어 모델을 위한 효율적 추론에 관한 연구

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

초록

Support