ChatPaper.aiChatPaper

从反应式到认知式:面向具身智能体的类脑空间智能

From reactive to cognitive: brain-inspired spatial intelligence for embodied agents

August 24, 2025
作者: Shouwei Ruan, Liyuan Wang, Caixin Kang, Qihui Zhu, Songming Liu, Xingxing Wei, Hang Su
cs.AI

摘要

空间认知通过构建内部空间模型,实现了适应性目标导向行为。稳健的生物系统将空间知识整合为三种相互关联的形式:地标用于显著线索,路径知识用于运动轨迹,而概览知识则用于地图式表征。尽管多模态大语言模型(MLLMs)的最新进展已使具身代理能够进行视觉-语言推理,但这些努力缺乏结构化空间记忆,仅以反应式方式运作,限制了其在复杂现实环境中的泛化与适应能力。本文提出“脑启发的空间认知导航”(BSC-Nav),一个为具身代理构建并利用结构化空间记忆的统一框架。BSC-Nav从自我中心轨迹及上下文线索中构建异中心认知地图,并动态检索与语义目标对齐的空间知识。结合强大的MLLMs,BSC-Nav在多样化导航任务中实现了顶尖的效能与效率,展现出强大的零样本泛化能力,并支持现实物理世界中的多功能具身行为,为通向通用空间智能提供了一条可扩展且基于生物学的路径。
English
Spatial cognition enables adaptive goal-directed behavior by constructing internal models of space. Robust biological systems consolidate spatial knowledge into three interconnected forms: landmarks for salient cues, route knowledge for movement trajectories, and survey knowledge for map-like representations. While recent advances in multi-modal large language models (MLLMs) have enabled visual-language reasoning in embodied agents, these efforts lack structured spatial memory and instead operate reactively, limiting their generalization and adaptability in complex real-world environments. Here we present Brain-inspired Spatial Cognition for Navigation (BSC-Nav), a unified framework for constructing and leveraging structured spatial memory in embodied agents. BSC-Nav builds allocentric cognitive maps from egocentric trajectories and contextual cues, and dynamically retrieves spatial knowledge aligned with semantic goals. Integrated with powerful MLLMs, BSC-Nav achieves state-of-the-art efficacy and efficiency across diverse navigation tasks, demonstrates strong zero-shot generalization, and supports versatile embodied behaviors in the real physical world, offering a scalable and biologically grounded path toward general-purpose spatial intelligence.
PDF62September 2, 2025