フロンティアモデルにおいて空間認知は生じるか？

要旨

まだです。私たちはSPACEを提案します。これは、フロンティアモデルにおける空間認知を体系的に評価するベンチマークです。当該ベンチマークは、認知科学の数十年にわたる研究に基づいて構築されています。このベンチマークは、生物が物理環境を横断する際に必要とされる大規模なマッピング能力、物体の形状や配置に関する小規模な推論、および空間的注意や記憶などの認知インフラを評価します。多くのタスクでは、テキストと画像の並列表示を具体化し、大規模言語モデルと大規模多モーダルモデルの両方をベンチマークすることが可能です。結果からは、現代のフロンティアモデルが動物の空間知能には及ばず、動物の認知の古典的なテストのいくつかで、ほぼ偶然のレベルで実行されることが示唆されています。

English

Not yet. We present SPACE, a benchmark that systematically evaluates spatial cognition in frontier models. Our benchmark builds on decades of research in cognitive science. It evaluates large-scale mapping abilities that are brought to bear when an organism traverses physical environments, smaller-scale reasoning about object shapes and layouts, and cognitive infrastructure such as spatial attention and memory. For many tasks, we instantiate parallel presentations via text and images, allowing us to benchmark both large language models and large multimodal models. Results suggest that contemporary frontier models fall short of the spatial intelligence of animals, performing near chance level on a number of classic tests of animal cognition.

フロンティアモデルにおいて空間認知は生じるか？

Does Spatial Cognition Emerge in Frontier Models?

要旨

Support