ChatPaper.aiChatPaper

從反應式到認知式:面向具身智能體的類腦空間智能

From reactive to cognitive: brain-inspired spatial intelligence for embodied agents

August 24, 2025
作者: Shouwei Ruan, Liyuan Wang, Caixin Kang, Qihui Zhu, Songming Liu, Xingxing Wei, Hang Su
cs.AI

摘要

空間認知通過構建空間的內部模型,實現了適應性的目標導向行為。強大的生物系統將空間知識整合為三種相互關聯的形式:用於顯著線索的地標、用於移動軌跡的路線知識,以及用於地圖式表示的概覽知識。儘管多模態大型語言模型(MLLMs)的最新進展已使具身代理能夠進行視覺語言推理,但這些努力缺乏結構化的空間記憶,而是以反應式的方式運作,限制了其在複雜現實環境中的泛化能力和適應性。在此,我們提出了受生物啟發的導航空間認知(BSC-Nav),這是一個用於構建和利用具身代理中結構化空間記憶的統一框架。BSC-Nav從自我中心軌跡和上下文線索中構建出異中心認知地圖,並動態檢索與語義目標對齊的空間知識。結合強大的MLLMs,BSC-Nav在多樣化的導航任務中實現了最先進的效能和效率,展示了強大的零樣本泛化能力,並支持在真實物理世界中的多功能具身行為,為通用空間智能提供了一條可擴展且基於生物學基礎的路徑。
English
Spatial cognition enables adaptive goal-directed behavior by constructing internal models of space. Robust biological systems consolidate spatial knowledge into three interconnected forms: landmarks for salient cues, route knowledge for movement trajectories, and survey knowledge for map-like representations. While recent advances in multi-modal large language models (MLLMs) have enabled visual-language reasoning in embodied agents, these efforts lack structured spatial memory and instead operate reactively, limiting their generalization and adaptability in complex real-world environments. Here we present Brain-inspired Spatial Cognition for Navigation (BSC-Nav), a unified framework for constructing and leveraging structured spatial memory in embodied agents. BSC-Nav builds allocentric cognitive maps from egocentric trajectories and contextual cues, and dynamically retrieves spatial knowledge aligned with semantic goals. Integrated with powerful MLLMs, BSC-Nav achieves state-of-the-art efficacy and efficiency across diverse navigation tasks, demonstrates strong zero-shot generalization, and supports versatile embodied behaviors in the real physical world, offering a scalable and biologically grounded path toward general-purpose spatial intelligence.
PDF62September 2, 2025