大規模推論モデルのための効率的な推論手法の調査：言語、マルチモーダル、そしてその先へ

要旨

最近の大規模推論モデル（LRM）、例えばDeepSeek-R1やOpenAI o1などは、推論時にChain-of-Thought（CoT）推論の長さをスケールアップすることで、強力な性能向上を示しています。しかし、これらのモデルが過度に長い推論トレースを生成する傾向が増しており、その中には冗長な内容（例えば、繰り返される定義）や、単純な問題に対する過剰な分析、難しいタスクに対する表面的な複数推論パスの探索などが含まれていることが懸念されています。この非効率性は、トークン経済が重要な訓練、推論、および実世界での展開（例えば、エージェントベースのシステム）において、重大な課題を引き起こします。本調査では、この新しいパラダイムで生じる特有の課題に特に焦点を当て、LRMの推論効率を改善するための最近の取り組みを包括的に概観します。非効率性の共通パターンを特定し、LRMのライフサイクル（すなわち、事前学習から推論まで）で提案された方法を検討し、研究の有望な将来の方向性について議論します。継続的な開発を支援するため、この分野の最近の進捗を追跡するリアルタイムのGitHubリポジトリも維持しています。本調査がさらなる探求の基盤となり、この急速に進化する分野における革新を刺激することを願っています。

English

Recent Large Reasoning Models (LRMs), such as DeepSeek-R1 and OpenAI o1, have demonstrated strong performance gains by scaling up the length of Chain-of-Thought (CoT) reasoning during inference. However, a growing concern lies in their tendency to produce excessively long reasoning traces, which are often filled with redundant content (e.g., repeated definitions), over-analysis of simple problems, and superficial exploration of multiple reasoning paths for harder tasks. This inefficiency introduces significant challenges for training, inference, and real-world deployment (e.g., in agent-based systems), where token economy is critical. In this survey, we provide a comprehensive overview of recent efforts aimed at improving reasoning efficiency in LRMs, with a particular focus on the unique challenges that arise in this new paradigm. We identify common patterns of inefficiency, examine methods proposed across the LRM lifecycle, i.e., from pretraining to inference, and discuss promising future directions for research. To support ongoing development, we also maintain a real-time GitHub repository tracking recent progress in the field. We hope this survey serves as a foundation for further exploration and inspires innovation in this rapidly evolving area.

大規模推論モデルのための効率的な推論手法の調査：言語、マルチモーダル、そしてその先へ

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

要旨

Support