過剰思考をやめる：大規模言語モデルの効率的な推論に関するサーベイ

要旨

大規模言語モデル（LLMs）は、複雑なタスクにおいて顕著な能力を発揮してきた。最近の大規模推論モデル（LRMs）の進歩、例えばOpenAI o1やDeepSeek-R1などは、教師あり微調整（SFT）や強化学習（RL）技術を活用して、数学やプログラミングなどのSystem-2推論領域での性能をさらに向上させている。しかし、長いChain-of-Thought（CoT）推論シーケンスは性能を向上させる一方で、冗長で冗長な出力による「過剰思考現象」として知られる計算オーバーヘッドを引き起こす。本論文では、LLMsにおける効率的な推論を達成するための現在の進展を体系的に調査し、探求する初めての構造化された調査を提供する。全体として、LLMsの内在的なメカニズムに依存し、既存の研究をいくつかの主要な方向に分類する：（1）モデルベースの効率的推論、これは完全な長さの推論モデルをより簡潔な推論モデルに最適化するか、直接効率的な推論モデルを訓練することを考慮する；（2）推論出力ベースの効率的推論、これは推論中に推論ステップと長さを動的に削減することを目指す；（3）入力プロンプトベースの効率的推論、これは入力プロンプトの特性（例えば難易度や長さ制御）に基づいて推論効率を向上させることを探求する。さらに、推論モデルの訓練に効率的なデータを使用すること、小規模言語モデルの推論能力を探求すること、評価方法とベンチマークについても議論する。

English

Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks. Recent advancements in Large Reasoning Models (LRMs), such as OpenAI o1 and DeepSeek-R1, have further improved performance in System-2 reasoning domains like mathematics and programming by harnessing supervised fine-tuning (SFT) and reinforcement learning (RL) techniques to enhance the Chain-of-Thought (CoT) reasoning. However, while longer CoT reasoning sequences improve performance, they also introduce significant computational overhead due to verbose and redundant outputs, known as the "overthinking phenomenon". In this paper, we provide the first structured survey to systematically investigate and explore the current progress toward achieving efficient reasoning in LLMs. Overall, relying on the inherent mechanism of LLMs, we categorize existing works into several key directions: (1) model-based efficient reasoning, which considers optimizing full-length reasoning models into more concise reasoning models or directly training efficient reasoning models; (2) reasoning output-based efficient reasoning, which aims to dynamically reduce reasoning steps and length during inference; (3) input prompts-based efficient reasoning, which seeks to enhance reasoning efficiency based on input prompt properties such as difficulty or length control. Additionally, we introduce the use of efficient data for training reasoning models, explore the reasoning capabilities of small language models, and discuss evaluation methods and benchmarking.

過剰思考をやめる：大規模言語モデルの効率的な推論に関するサーベイ

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

要旨

Support