大型推理模型的高效推論：綜述

摘要

大型推理模型（LRMs）通過學習推理顯著提升了大型語言模型（LLMs）的推理能力，在複雜任務解決中展現出優異性能。然而，其深思熟慮的推理過程導致了令牌使用、記憶體消耗及推理時間上的低效。因此，本調查專門針對LRMs設計的高效推理方法進行了回顧，重點在於緩解令牌低效問題的同時保持推理質量。首先，我們引入了一種分類法，將近期方法歸納為兩大類：(a) 顯式緊湊的思維鏈（CoT），該方法在保持顯式推理結構的同時減少令牌使用；(b) 隱式潛在的CoT，它將推理步驟編碼於隱藏表示中而非顯式令牌。同時，我們探討了這些方法的優缺點。接著，我們從性能與效率兩個維度對現有方法進行了實證分析。此外，我們提出了該領域面臨的開放性挑戰，包括以人為本的可控推理、推理可解釋性與效率之間的權衡、確保高效推理的安全性，以及高效推理的更廣泛應用。另外，我們強調了通過模型融合、新架構及代理路由器等技術提升LRMs推理效率的關鍵見解。我們希望這項工作能成為一份寶貴指南，助力研究人員克服這一充滿活力領域中的挑戰。

English

Large Reasoning Models (LRMs) significantly improve the reasoning ability of Large Language Models (LLMs) by learning to reason, exhibiting promising performance in complex task-solving. However, their deliberative reasoning process leads to inefficiencies in token usage, memory consumption, and inference time. Thus, this survey provides a review of efficient inference methods designed specifically for LRMs, focusing on mitigating token inefficiency while preserving the reasoning quality. First, we introduce a taxonomy to group the recent methods into two main categories: (a) explicit compact Chain-of-Thought (CoT), which reduces tokens while keeping the explicit reasoning structure, and (b) implicit latent CoT, which encodes reasoning steps within hidden representations instead of explicit tokens. Meanwhile, we discuss their strengths and weaknesses. Then, we conduct empirical analyses on existing methods from performance and efficiency aspects. Besides, we present open challenges in this field, including human-centric controllable reasoning, trade-off between interpretability and efficiency of reasoning, ensuring safety of efficient reasoning, and broader applications of efficient reasoning. In addition, we highlight key insights for enhancing LRMs' inference efficiency via techniques such as model merging, new architectures, and agent routers. We hope this work serves as a valuable guide, helping researchers overcome challenges in this vibrant fieldhttps://github.com/yueliu1999/Awesome-Efficient-Inference-for-LRMs.

大型推理模型的高效推論：綜述

Efficient Inference for Large Reasoning Models: A Survey

摘要

Support