前沿模型中是否出現空間認知？

摘要

我們提出了SPACE，一個系統性評估前沿模型中空間認知的基準。我們的基準建立在幾十年的認知科學研究基礎之上。它評估了在生物體穿越物理環境時所展現的大規模映射能力，關於物體形狀和佈局的小規模推理，以及空間注意力和記憶等認知基礎設施。對於許多任務，我們通過文本和圖像實例化並行呈現，從而使我們能夠評估大型語言模型和大型多模型模型。結果表明，當代前沿模型在動物空間智能方面表現不佳，對於多項經典的動物認知測試幾乎達到偶然水平。

English

Not yet. We present SPACE, a benchmark that systematically evaluates spatial cognition in frontier models. Our benchmark builds on decades of research in cognitive science. It evaluates large-scale mapping abilities that are brought to bear when an organism traverses physical environments, smaller-scale reasoning about object shapes and layouts, and cognitive infrastructure such as spatial attention and memory. For many tasks, we instantiate parallel presentations via text and images, allowing us to benchmark both large language models and large multimodal models. Results suggest that contemporary frontier models fall short of the spatial intelligence of animals, performing near chance level on a number of classic tests of animal cognition.