AIが戦場の不確実性を航行するとき

要旨

AIは、戦争の行方が歴史的に明らかになる以前に、その推移を推論できるだろうか。この能力を分析することは困難である。なぜなら、遡及的な地政学的予測は、訓練データの漏洩によって大きく混同されるためだ。我々はこの課題に、2026年中東紛争の初期段階という、現在の最先端モデルの学習打ち切り後に展開した、時間的根拠に基づく事例研究を通じて取り組む。具体的には、11の重要な時間的ノード、42のノード固有の検証可能な質問、および5つの一般的な探求的質問を構築し、各時点で公的に利用可能であった情報のみに基づいてモデルに推論を要求する。この設計は、訓練データの漏洩に関する懸念を大幅に軽減し、戦争の霧のもとで進行中の危機をモデルが如何に分析するかを研究するのに適した環境を創出するとともに、我々の知る限り、進行中の地政学的紛争におけるLLMの推論に関する初の時間的根拠に基づく分析を提供する。分析により、主に3つの知見が得られた。第一に、現在の最先端大規模言語モデルは、しばしば驚くべき程度の戦略的現実主義を示し、表面的なレトリックを超えて、より深い構造的インセンティブに向けた推論を行う。第二に、この能力は領域によって偏りがあり、モデルは政治的にあいまいな多数のアクターが関わる環境よりも、経済的・物流的に構造化された設定においてより信頼性が高い。最後に、モデルのナラティブは時間とともに進化し、早期の封じ込めという期待から、地域的な固定化と消耗型の緊張緩和というより体系的な説明へと移行する。本稿の執筆時点では紛争は依然として継続中であるため、この研究は、進行中の地政学的危機におけるモデル推論のアーカイブ的なスナップショットとして機能し、遡及的分析に伴う後知恵バイアスなしで将来の研究を可能にするものである。

English

Can AI reason about a war before its trajectory becomes historically obvious? Analyzing this capability is difficult because retrospective geopolitical prediction is heavily confounded by training-data leakage. We address this challenge through a temporally grounded case study of the early stages of the 2026 Middle East conflict, which unfolded after the training cutoff of current frontier models. We construct 11 critical temporal nodes, 42 node-specific verifiable questions, and 5 general exploratory questions, requiring models to reason only from information that would have been publicly available at each moment. This design substantially mitigates training-data leakage concerns, creating a setting well-suited for studying how models analyze an unfolding crisis under the fog of war, and provides, to our knowledge, the first temporally grounded analysis of LLM reasoning in an ongoing geopolitical conflict. Our analysis reveals three main findings. First, current state-of-the-art large language models often display a striking degree of strategic realism, reasoning beyond surface rhetoric toward deeper structural incentives. Second, this capability is uneven across domains: models are more reliable in economically and logistically structured settings than in politically ambiguous multi-actor environments. Finally, model narratives evolve over time, shifting from early expectations of rapid containment toward more systemic accounts of regional entrenchment and attritional de-escalation. Since the conflict remains ongoing at the time of writing, this work can serve as an archival snapshot of model reasoning during an unfolding geopolitical crisis, enabling future studies without the hindsight bias of retrospective analysis.

AIが戦場の不確実性を航行するとき

When AI Navigates the Fog of War

要旨

Support