ChatPaper.aiChatPaper

Astra:通过分层多模态学习迈向通用移动机器人

Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning

June 6, 2025
作者: Sheng Chen, Peiyu He, Jiaxin Hu, Ziyang Liu, Yansheng Wang, Tao Xu, Chi Zhang, Chongchong Zhang, Chao An, Shiyu Cai, Duo Cao, Kangping Chen, Shuai Chu, Tianwei Chu, Mingdi Dan, Min Du, Weiwei Fang, Pengyou Fu, Junkai Hu, Xiaowei Jiang, Zhaodi Jiang, Fuxuan Li, Jun Li, Minghui Li, Mingyao Li, Yanchang Li, Zhibin Li, Guangming Liu, Kairui Liu, Lihao Liu, Weizhi Liu, Xiaoshun Liu, Yufei Liu, Yunfei Liu, Qiang Lu, Yuanfei Luo, Xiang Lv, Hongying Ma, Sai Ma, Lingxian Mi, Sha Sa, Hongxiang Shu, Lei Tian, Chengzhi Wang, Jiayu Wang, Kaijie Wang, Qingyi Wang, Renwen Wang, Tao Wang, Wei Wang, Xirui Wang, Chao Wei, Xuguang Wei, Zijun Xia, Zhaohao Xiao, Tingshuai Yan, Liyan Yang, Yifan Yang, Zhikai Yang, Zhong Yin, Li Yuan, Liuchun Yuan, Chi Zhang, Jinyang Zhang, Junhui Zhang, Linge Zhang, Zhenyi Zhang, Zheyu Zhang, Dongjie Zhu, Hang Li, Yangang Zhang
cs.AI

摘要

现代机器人导航系统在多样且复杂的室内环境中面临诸多挑战。传统方法依赖于多个小型模型或基于规则的模块,因而缺乏对新环境的适应能力。为解决这一问题,我们开发了Astra,一种全面的双模型架构,包括Astra-Global和Astra-Local,专为移动机器人导航设计。Astra-Global作为一种多模态大语言模型,处理视觉与语言输入,利用混合拓扑语义图作为全局地图进行自我定位与目标定位,其性能超越传统的视觉地点识别方法。Astra-Local则是一个多任务网络,负责局部路径规划与里程计估计。其通过自监督学习训练的4D时空编码器,为下游任务生成稳健的4D特征。规划模块采用流匹配技术和新颖的掩码ESDF损失函数,以最小化碰撞风险,生成局部轨迹;而里程计模块则通过Transformer编码器整合多传感器输入,预测机器人的相对姿态。在实际部署于室内移动机器人上时,Astra在多种室内环境中实现了高端的端到端任务成功率。
English
Modern robot navigation systems encounter difficulties in diverse and complex indoor environments. Traditional approaches rely on multiple modules with small models or rule-based systems and thus lack adaptability to new environments. To address this, we developed Astra, a comprehensive dual-model architecture, Astra-Global and Astra-Local, for mobile robot navigation. Astra-Global, a multimodal LLM, processes vision and language inputs to perform self and goal localization using a hybrid topological-semantic graph as the global map, and outperforms traditional visual place recognition methods. Astra-Local, a multitask network, handles local path planning and odometry estimation. Its 4D spatial-temporal encoder, trained through self-supervised learning, generates robust 4D features for downstream tasks. The planning head utilizes flow matching and a novel masked ESDF loss to minimize collision risks for generating local trajectories, and the odometry head integrates multi-sensor inputs via a transformer encoder to predict the relative pose of the robot. Deployed on real in-house mobile robots, Astra achieves high end-to-end mission success rate across diverse indoor environments.
PDF272June 10, 2025