ChatPaper.aiChatPaper

停止過度思考:大型語言模型高效推理研究綜述

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

March 20, 2025
作者: Yang Sui, Yu-Neng Chuang, Guanchu Wang, Jiamu Zhang, Tianyi Zhang, Jiayi Yuan, Hongyi Liu, Andrew Wen, Shaochen, Zhong, Hanjie Chen, Xia Hu
cs.AI

摘要

大型語言模型(LLMs)在複雜任務中展現了卓越的能力。近期,大型推理模型(LRMs)如OpenAI o1和DeepSeek-R1的進展,通過利用監督微調(SFT)和強化學習(RL)技術來增強鏈式思維(CoT)推理,進一步提升了在數學和編程等系統二推理領域的表現。然而,雖然更長的CoT推理序列能提高性能,但也因冗長且重複的輸出引入了顯著的計算開銷,這一現象被稱為「過度思考現象」。本文首次提供了一份結構化的調查,系統性地探討了在LLMs中實現高效推理的當前進展。總體而言,基於LLMs的內在機制,我們將現有工作分為幾個關鍵方向:(1)基於模型的高效推理,考慮將全長推理模型優化為更簡潔的推理模型或直接訓練高效推理模型;(2)基於推理輸出的高效推理,旨在推理過程中動態減少推理步驟和長度;(3)基於輸入提示的高效推理,尋求根據輸入提示的屬性(如難度或長度控制)來提升推理效率。此外,我們還介紹了使用高效數據訓練推理模型的方法,探討了小型語言模型的推理能力,並討論了評估方法和基準測試。
English
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks. Recent advancements in Large Reasoning Models (LRMs), such as OpenAI o1 and DeepSeek-R1, have further improved performance in System-2 reasoning domains like mathematics and programming by harnessing supervised fine-tuning (SFT) and reinforcement learning (RL) techniques to enhance the Chain-of-Thought (CoT) reasoning. However, while longer CoT reasoning sequences improve performance, they also introduce significant computational overhead due to verbose and redundant outputs, known as the "overthinking phenomenon". In this paper, we provide the first structured survey to systematically investigate and explore the current progress toward achieving efficient reasoning in LLMs. Overall, relying on the inherent mechanism of LLMs, we categorize existing works into several key directions: (1) model-based efficient reasoning, which considers optimizing full-length reasoning models into more concise reasoning models or directly training efficient reasoning models; (2) reasoning output-based efficient reasoning, which aims to dynamically reduce reasoning steps and length during inference; (3) input prompts-based efficient reasoning, which seeks to enhance reasoning efficiency based on input prompt properties such as difficulty or length control. Additionally, we introduce the use of efficient data for training reasoning models, explore the reasoning capabilities of small language models, and discuss evaluation methods and benchmarking.

Summary

AI-Generated Summary

PDF732March 21, 2025