効率的な推論モデル：サーベイ

要旨

推論モデルは、最終的な答えに至る前に長い連鎖的思考（Chain-of-Thoughts, CoTs）を生成することで、複雑で論理的なタスクを解決する際に顕著な進歩を遂げてきました。しかし、この「遅い思考」パラダイムの出現により、多数のトークンが連続して生成されることで、必然的に大幅な計算オーバーヘッドが生じています。このため、効果的な高速化が緊急に必要とされています。本調査は、効率的な推論に関する最近の進展を包括的に概観することを目的としています。既存の研究を以下の3つの主要な方向に分類します：(1) 短縮 - 長いCoTsを簡潔でありながら効果的な推論連鎖に圧縮する、(2) 小型化 - 知識蒸留やその他のモデル圧縮技術、強化学習などを通じて、強力な推論能力を持つコンパクトな言語モデルを開発する、(3) 高速化 - 推論を加速するための効率的なデコード戦略を設計する。本調査で議論された論文の精選されたコレクションは、GitHubリポジトリで公開されています。

English

Reasoning models have demonstrated remarkable progress in solving complex and logic-intensive tasks by generating extended Chain-of-Thoughts (CoTs) prior to arriving at a final answer. Yet, the emergence of this "slow-thinking" paradigm, with numerous tokens generated in sequence, inevitably introduces substantial computational overhead. To this end, it highlights an urgent need for effective acceleration. This survey aims to provide a comprehensive overview of recent advances in efficient reasoning. It categorizes existing works into three key directions: (1) shorter - compressing lengthy CoTs into concise yet effective reasoning chains; (2) smaller - developing compact language models with strong reasoning capabilities through techniques such as knowledge distillation, other model compression techniques, and reinforcement learning; and (3) faster - designing efficient decoding strategies to accelerate inference. A curated collection of papers discussed in this survey is available in our GitHub repository.

効率的な推論モデル：サーベイ

Efficient Reasoning Models: A Survey

要旨

Support