前沿模型中会出现空间认知吗？

摘要

我们提出了SPACE，一个系统评估前沿模型中空间认知的基准。我们的基准建立在几十年的认知科学研究基础之上。它评估了大规模地图绘制能力，这种能力在生物体穿越物理环境时发挥作用，以及关于物体形状和布局的小规模推理，以及空间注意力和记忆等认知基础设施。对于许多任务，我们通过文本和图像实例化并行呈现，使我们能够评估大型语言模型和大型多模型模型。结果表明，当代前沿模型在动物的空间智能方面表现不佳，在许多经典动物认知测试中表现接近机会水平。

English

Not yet. We present SPACE, a benchmark that systematically evaluates spatial cognition in frontier models. Our benchmark builds on decades of research in cognitive science. It evaluates large-scale mapping abilities that are brought to bear when an organism traverses physical environments, smaller-scale reasoning about object shapes and layouts, and cognitive infrastructure such as spatial attention and memory. For many tasks, we instantiate parallel presentations via text and images, allowing us to benchmark both large language models and large multimodal models. Results suggest that contemporary frontier models fall short of the spatial intelligence of animals, performing near chance level on a number of classic tests of animal cognition.