大型推理模型的高效推論:綜述
Efficient Inference for Large Reasoning Models: A Survey
March 29, 2025
作者: Yue Liu, Jiaying Wu, Yufei He, Hongcheng Gao, Hongyu Chen, Baolong Bi, Jiaheng Zhang, Zhiqi Huang, Bryan Hooi
cs.AI
摘要
大型推理模型(LRMs)通過學習推理顯著提升了大型語言模型(LLMs)的推理能力,在複雜任務解決中展現出優異性能。然而,其深思熟慮的推理過程導致了令牌使用、記憶體消耗及推理時間上的低效。因此,本調查專門針對LRMs設計的高效推理方法進行了回顧,重點在於緩解令牌低效問題的同時保持推理質量。首先,我們引入了一種分類法,將近期方法歸納為兩大類:(a) 顯式緊湊的思維鏈(CoT),該方法在保持顯式推理結構的同時減少令牌使用;(b) 隱式潛在的CoT,它將推理步驟編碼於隱藏表示中而非顯式令牌。同時,我們探討了這些方法的優缺點。接著,我們從性能與效率兩個維度對現有方法進行了實證分析。此外,我們提出了該領域面臨的開放性挑戰,包括以人為本的可控推理、推理可解釋性與效率之間的權衡、確保高效推理的安全性,以及高效推理的更廣泛應用。另外,我們強調了通過模型融合、新架構及代理路由器等技術提升LRMs推理效率的關鍵見解。我們希望這項工作能成為一份寶貴指南,助力研究人員克服這一充滿活力領域中的挑戰。
English
Large Reasoning Models (LRMs) significantly improve the reasoning ability of
Large Language Models (LLMs) by learning to reason, exhibiting promising
performance in complex task-solving. However, their deliberative reasoning
process leads to inefficiencies in token usage, memory consumption, and
inference time. Thus, this survey provides a review of efficient inference
methods designed specifically for LRMs, focusing on mitigating token
inefficiency while preserving the reasoning quality. First, we introduce a
taxonomy to group the recent methods into two main categories: (a) explicit
compact Chain-of-Thought (CoT), which reduces tokens while keeping the explicit
reasoning structure, and (b) implicit latent CoT, which encodes reasoning steps
within hidden representations instead of explicit tokens. Meanwhile, we discuss
their strengths and weaknesses. Then, we conduct empirical analyses on existing
methods from performance and efficiency aspects. Besides, we present open
challenges in this field, including human-centric controllable reasoning,
trade-off between interpretability and efficiency of reasoning, ensuring safety
of efficient reasoning, and broader applications of efficient reasoning. In
addition, we highlight key insights for enhancing LRMs' inference efficiency
via techniques such as model merging, new architectures, and agent routers. We
hope this work serves as a valuable guide, helping researchers overcome
challenges in this vibrant
fieldhttps://github.com/yueliu1999/Awesome-Efficient-Inference-for-LRMs.Summary
AI-Generated Summary