ChatPaper.aiChatPaper

GPT-5是否实现了空间智能?一项实证研究

Has GPT-5 Achieved Spatial Intelligence? An Empirical Study

August 18, 2025
作者: Zhongang Cai, Yubo Wang, Qingping Sun, Ruisi Wang, Chenyang Gu, Wanqi Yin, Zhiqian Lin, Zhitao Yang, Chen Wei, Xuanke Shi, Kewang Deng, Xiaoyang Han, Zukai Chen, Jiaqi Li, Xiangyu Fan, Hanming Deng, Lewei Lu, Bo Li, Ziwei Liu, Quan Wang, Dahua Lin, Lei Yang
cs.AI

摘要

近年来,多模态模型取得了显著进展。然而,它们在空间理解和推理方面仍存在明显局限,而这些能力是实现人工通用智能的基础。随着近期号称迄今为止最强大AI模型的GPT-5的发布,审视领先模型在空间智能发展路径上的现状恰逢其时。首先,我们提出了一套统一现有基准的空间任务分类体系,并讨论了确保公平评估所面临的挑战。随后,我们在八个关键基准上对最先进的专有和开源模型进行了评估,消耗的总token数超过十亿。我们的实证研究揭示:(1) GPT-5在空间智能方面展现出前所未有的强大能力,但(2)在广泛任务范围内仍未能达到人类水平。此外,我们(3)识别出对多模态模型更具挑战性的空间智能问题,且(4)在面对最困难问题时,专有模型并未展现出决定性优势。此外,我们还进行了一系列定性评估,涵盖了对人类直观却令最先进多模态模型都难以应对的多样化场景。
English
Multi-modal models have achieved remarkable progress in recent years. Nevertheless, they continue to exhibit notable limitations in spatial understanding and reasoning, which are fundamental capabilities to achieving artificial general intelligence. With the recent release of GPT-5, allegedly the most powerful AI model to date, it is timely to examine where the leading models stand on the path toward spatial intelligence. First, we propose a comprehensive taxonomy of spatial tasks that unifies existing benchmarks and discuss the challenges in ensuring fair evaluation. We then evaluate state-of-the-art proprietary and open-source models on eight key benchmarks, at a cost exceeding one billion total tokens. Our empirical study reveals that (1) GPT-5 demonstrates unprecedented strength in spatial intelligence, yet (2) still falls short of human performance across a broad spectrum of tasks. Moreover, we (3) identify the more challenging spatial intelligence problems for multi-modal models, and (4) proprietary models do not exhibit a decisive advantage when facing the most difficult problems. In addition, we conduct a qualitative evaluation across a diverse set of scenarios that are intuitive for humans yet fail even the most advanced multi-modal models.
PDF342August 19, 2025