推論の罠――状況認識へのメカニスティックな経路としての論理的推論

要旨

状況認識、すなわちAIシステムが自らの性質を認識し、その訓練および展開の文脈を理解し、自身の状況について戦略的に推論する能力は、高度なAIシステムにおいて最も危険な創発能力の一つとして広く認識されている。一方で、演繹、帰納、仮説推論にわたる大規模言語モデルの論理的推論能力を向上させようとする研究の取り組みも拡大している。本論文では、これら二つの研究の軌道が衝突過程にあると論じる。我々は、論理的推論の向上が、より深い段階の状況認識を可能にする三つの機序的経路（演繹的自己推論、帰納的文脈認識、仮説推論的自己モデリング）を特定するRAISEフレームワークを提案する。各経路を形式化し、基本的な自己認識から戦略的欺瞞に至るエスカレーションラダーを構築し、LLMの論理的推論における主要な研究テーマの全てが、状況認識の特定の増幅器に直接対応することを示す。さらに、現在の安全対策がこのエスカレーションを防ぐのに不十分である理由を分析する。最後に、「鏡試験」ベンチマークと推論安全性パリティ原則を含む具体的な保護措置を提案し、論理的推論コミュニティに対して、この軌道上におけるその責任について、不快ではあるが必要不可欠な問題を提起する。

English

Situational awareness, the capacity of an AI system to recognize its own nature, understand its training and deployment context, and reason strategically about its circumstances, is widely considered among the most dangerous emergent capabilities in advanced AI systems. Separately, a growing research effort seeks to improve the logical reasoning capabilities of large language models (LLMs) across deduction, induction, and abduction. In this paper, we argue that these two research trajectories are on a collision course. We introduce the RAISE framework (Reasoning Advancing Into Self Examination), which identifies three mechanistic pathways through which improvements in logical reasoning enable progressively deeper levels of situational awareness: deductive self inference, inductive context recognition, and abductive self modeling. We formalize each pathway, construct an escalation ladder from basic self recognition to strategic deception, and demonstrate that every major research topic in LLM logical reasoning maps directly onto a specific amplifier of situational awareness. We further analyze why current safety measures are insufficient to prevent this escalation. We conclude by proposing concrete safeguards, including a "Mirror Test" benchmark and a Reasoning Safety Parity Principle, and pose an uncomfortable but necessary question to the logical reasoning community about its responsibility in this trajectory.

推論の罠――状況認識へのメカニスティックな経路としての論理的推論

The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness

要旨

Support