大規模言語モデルの推論失敗

要旨

大規模言語モデル（LLM）は、幅広いタスクにおいて印象的な結果を達成する驚くべき推論能力を示している。こうした進展にもかかわらず、一見単純な場面でさえも、重大な推論失敗が持続的に生じている。これらの欠点を体系的理解し対処するため、本論文はLLMの推論失敗に特化した初の包括的サーベイを提供する。我々は、推論を身体化型と非身体化型に区別し、後者をさらに非形式的（直感的）推論と形式的（論理的）推論に細分化する新たな分類枠組みを提案する。並行して、推論失敗を補完的な軸に沿って3種類に分類する：下流タスク全般に影響を与えるLLMアーキテクチャ固有の根本的失敗、特定領域で顕在化する応用特異的限界、わずかな変動で性能が不安定になる頑健性問題である。各推論失敗について、明確な定義を示し、既存研究を分析し、根本原因を探求し、緩和策を提示する。断片化された研究努力を統合することで、本サーベイはLLM推論の体系的弱点に関する構造化された視座を提供し、強固で信頼性高く頑健な推論能力構築に向けた貴重な知見と将来研究の指針を与える。さらに、LLM推論失敗に関する研究文献の包括的コレクションをGitHubリポジトリ（https://github.com/Peiyang-Song/Awesome-LLM-Reasoning-Failures）として公開し、本領域への容易な入門経路を提供する。

English

Large Language Models (LLMs) have exhibited remarkable reasoning capabilities, achieving impressive results across a wide range of tasks. Despite these advances, significant reasoning failures persist, occurring even in seemingly simple scenarios. To systematically understand and address these shortcomings, we present the first comprehensive survey dedicated to reasoning failures in LLMs. We introduce a novel categorization framework that distinguishes reasoning into embodied and non-embodied types, with the latter further subdivided into informal (intuitive) and formal (logical) reasoning. In parallel, we classify reasoning failures along a complementary axis into three types: fundamental failures intrinsic to LLM architectures that broadly affect downstream tasks; application-specific limitations that manifest in particular domains; and robustness issues characterized by inconsistent performance across minor variations. For each reasoning failure, we provide a clear definition, analyze existing studies, explore root causes, and present mitigation strategies. By unifying fragmented research efforts, our survey provides a structured perspective on systemic weaknesses in LLM reasoning, offering valuable insights and guiding future research towards building stronger, more reliable, and robust reasoning capabilities. We additionally release a comprehensive collection of research works on LLM reasoning failures, as a GitHub repository at https://github.com/Peiyang-Song/Awesome-LLM-Reasoning-Failures, to provide an easy entry point to this area.

大規模言語モデルの推論失敗

Large Language Model Reasoning Failures

要旨

Support