Astra:透過分層多模態學習邁向通用型移動機器人
Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning
June 6, 2025
作者: Sheng Chen, Peiyu He, Jiaxin Hu, Ziyang Liu, Yansheng Wang, Tao Xu, Chi Zhang, Chongchong Zhang, Chao An, Shiyu Cai, Duo Cao, Kangping Chen, Shuai Chu, Tianwei Chu, Mingdi Dan, Min Du, Weiwei Fang, Pengyou Fu, Junkai Hu, Xiaowei Jiang, Zhaodi Jiang, Fuxuan Li, Jun Li, Minghui Li, Mingyao Li, Yanchang Li, Zhibin Li, Guangming Liu, Kairui Liu, Lihao Liu, Weizhi Liu, Xiaoshun Liu, Yufei Liu, Yunfei Liu, Qiang Lu, Yuanfei Luo, Xiang Lv, Hongying Ma, Sai Ma, Lingxian Mi, Sha Sa, Hongxiang Shu, Lei Tian, Chengzhi Wang, Jiayu Wang, Kaijie Wang, Qingyi Wang, Renwen Wang, Tao Wang, Wei Wang, Xirui Wang, Chao Wei, Xuguang Wei, Zijun Xia, Zhaohao Xiao, Tingshuai Yan, Liyan Yang, Yifan Yang, Zhikai Yang, Zhong Yin, Li Yuan, Liuchun Yuan, Chi Zhang, Jinyang Zhang, Junhui Zhang, Linge Zhang, Zhenyi Zhang, Zheyu Zhang, Dongjie Zhu, Hang Li, Yangang Zhang
cs.AI
摘要
現代機器人導航系統在多樣且複雜的室內環境中面臨諸多挑戰。傳統方法依賴於多個小型模型或基於規則的系統模塊,因而缺乏對新環境的適應能力。為解決這一問題,我們開發了Astra,這是一種全面的雙模型架構,包含Astra-Global和Astra-Local,專為移動機器人導航設計。Astra-Global作為一個多模態大語言模型,處理視覺與語言輸入,利用混合拓撲語義圖作為全局地圖進行自我定位與目標定位,其性能超越傳統的視覺地點識別方法。Astra-Local則是一個多任務網絡,負責局部路徑規劃與里程計估計。其4D時空編碼器通過自監督學習訓練,生成穩健的4D特徵以供下游任務使用。規劃頭部採用流匹配技術及新穎的掩碼ESDF損失函數,以最小化碰撞風險,生成局部軌跡;而里程計頭部則通過變壓器編碼器整合多傳感器輸入,預測機器人的相對姿態。Astra已部署於實際的室內移動機器人上,在多樣化的室內環境中實現了高端的端到端任務成功率。
English
Modern robot navigation systems encounter difficulties in diverse and complex
indoor environments. Traditional approaches rely on multiple modules with small
models or rule-based systems and thus lack adaptability to new environments. To
address this, we developed Astra, a comprehensive dual-model architecture,
Astra-Global and Astra-Local, for mobile robot navigation. Astra-Global, a
multimodal LLM, processes vision and language inputs to perform self and goal
localization using a hybrid topological-semantic graph as the global map, and
outperforms traditional visual place recognition methods. Astra-Local, a
multitask network, handles local path planning and odometry estimation. Its 4D
spatial-temporal encoder, trained through self-supervised learning, generates
robust 4D features for downstream tasks. The planning head utilizes flow
matching and a novel masked ESDF loss to minimize collision risks for
generating local trajectories, and the odometry head integrates multi-sensor
inputs via a transformer encoder to predict the relative pose of the robot.
Deployed on real in-house mobile robots, Astra achieves high end-to-end mission
success rate across diverse indoor environments.