推論の経済性を活用する：大規模言語モデルのための効率的な推論に関するサーベイ

要旨

大規模言語モデル（LLMs）の最近の進展により、複雑な推論タスクを実行する能力が大幅に向上し、高速で直感的な思考（システム1）から遅く深い推論（システム2）への移行が進んでいます。システム2の推論はタスクの精度を向上させますが、その遅い思考の性質や非効率的または不必要な推論行動により、しばしば多大な計算コストを伴います。一方、システム1の推論は計算効率が高いものの、最適なパフォーマンスには至りません。したがって、パフォーマンス（利点）と計算コスト（予算）のトレードオフをバランスさせることが重要であり、これにより「推論経済性」の概念が生まれました。本調査では、LLMsのポストトレーニング段階とテスト時推論段階における推論経済性を包括的に分析し、i) 推論の非効率性の原因、ii) 異なる推論パターンの行動分析、iii) 推論経済性を達成するための潜在的な解決策を網羅します。実践的な洞察を提供し、未解決の課題を強調することで、LLMsの推論経済性を改善するための戦略に光を当て、この進化する分野の研究を推進するための貴重なリソースとなることを目指します。また、この急速に進化する分野の進展を継続的に追跡するための公開リポジトリも提供します。

English

Recent advancements in Large Language Models (LLMs) have significantly enhanced their ability to perform complex reasoning tasks, transitioning from fast and intuitive thinking (System 1) to slow and deep reasoning (System 2). While System 2 reasoning improves task accuracy, it often incurs substantial computational costs due to its slow thinking nature and inefficient or unnecessary reasoning behaviors. In contrast, System 1 reasoning is computationally efficient but leads to suboptimal performance. Consequently, it is critical to balance the trade-off between performance (benefits) and computational costs (budgets), giving rise to the concept of reasoning economy. In this survey, we provide a comprehensive analysis of reasoning economy in both the post-training and test-time inference stages of LLMs, encompassing i) the cause of reasoning inefficiency, ii) behavior analysis of different reasoning patterns, and iii) potential solutions to achieve reasoning economy. By offering actionable insights and highlighting open challenges, we aim to shed light on strategies for improving the reasoning economy of LLMs, thereby serving as a valuable resource for advancing research in this evolving area. We also provide a public repository to continually track developments in this fast-evolving field.