当人工智能穿越战争迷雾

摘要

人工智能能否在战争轨迹尚未历史性明朗前进行推演？由于回顾性地缘政治预测极易受到训练数据泄露的干扰，这项能力分析变得尤为困难。我们通过基于时间锚点的案例研究来应对这一挑战，聚焦于2026年中东冲突早期阶段——该事件发生在当前前沿模型训练数据截止日期之后。我们构建了11个关键时间节点、42个节点特异性可验证问题和5个全局探索性问题，要求模型仅基于各时间节点当时公开可得信息进行推演。该设计显著缓解了训练数据泄露问题，构建出非常适合研究模型如何在战争迷雾中分析危机演变的场景，并首次实现了对大型语言模型在持续性地缘政治冲突中推理能力的时序锚定分析。我们的研究揭示了三项主要发现：首先，当前最先进的大语言模型常表现出惊人的战略现实主义倾向，能够超越表面修辞洞悉深层结构性动因；其次，这种能力存在领域不均衡性——模型在经济和物流结构化场景中的表现优于政治模糊的多行为体环境；最后，模型叙事会随时间演变，从早期预期快速遏制转向更具系统性的区域僵局与消耗性降级论述。由于本文撰写时冲突仍在持续，这项工作可作为危机演进过程中模型推理能力的档案快照，为未来研究提供免受回顾性分析后见之明干扰的基准。

English

Can AI reason about a war before its trajectory becomes historically obvious? Analyzing this capability is difficult because retrospective geopolitical prediction is heavily confounded by training-data leakage. We address this challenge through a temporally grounded case study of the early stages of the 2026 Middle East conflict, which unfolded after the training cutoff of current frontier models. We construct 11 critical temporal nodes, 42 node-specific verifiable questions, and 5 general exploratory questions, requiring models to reason only from information that would have been publicly available at each moment. This design substantially mitigates training-data leakage concerns, creating a setting well-suited for studying how models analyze an unfolding crisis under the fog of war, and provides, to our knowledge, the first temporally grounded analysis of LLM reasoning in an ongoing geopolitical conflict. Our analysis reveals three main findings. First, current state-of-the-art large language models often display a striking degree of strategic realism, reasoning beyond surface rhetoric toward deeper structural incentives. Second, this capability is uneven across domains: models are more reliable in economically and logistically structured settings than in politically ambiguous multi-actor environments. Finally, model narratives evolve over time, shifting from early expectations of rapid containment toward more systemic accounts of regional entrenchment and attritional de-escalation. Since the conflict remains ongoing at the time of writing, this work can serve as an archival snapshot of model reasoning during an unfolding geopolitical crisis, enabling future studies without the hindsight bias of retrospective analysis.