프론티어 모델에서 공간 인지가 나타납니까?

초록

아직 아닙니다. 우리는 공간 인지를 체계적으로 평가하는 SPACE라는 벤치마크를 제시합니다. 우리의 벤치마크는 인지과학 분야 몇십 년에 걸친 연구를 기반으로 구축되었습니다. 이는 유기체가 물리적 환경을 횡단할 때 발휘되는 대규모 매핑 능력, 물체 모양 및 배치에 대한 소규모 추론, 그리고 공간 주의와 기억과 같은 인지 인프라를 평가합니다. 많은 작업에서 우리는 텍스트와 이미지를 통해 병렬 제시를 구현하여 대형 언어 모델과 대형 다중 모달 모델을 모두 벤치마킹할 수 있습니다. 결과는 현대의 최첨단 모델이 동물의 공간 지능에 미치지 못하며, 동물의 인지 능력을 평가하는 여러 고전적 테스트에서 거의 우연 수준의 성능을 보인다는 것을 시사합니다.

English

Not yet. We present SPACE, a benchmark that systematically evaluates spatial cognition in frontier models. Our benchmark builds on decades of research in cognitive science. It evaluates large-scale mapping abilities that are brought to bear when an organism traverses physical environments, smaller-scale reasoning about object shapes and layouts, and cognitive infrastructure such as spatial attention and memory. For many tasks, we instantiate parallel presentations via text and images, allowing us to benchmark both large language models and large multimodal models. Results suggest that contemporary frontier models fall short of the spatial intelligence of animals, performing near chance level on a number of classic tests of animal cognition.