ABot-N0:面向通用具身导航的VLA基础模型技术报告
ABot-N0: Technical Report on the VLA Foundation Model for Versatile Embodied Navigation
February 12, 2026
作者: Zedong Chu, Shichao Xie, Xiaolong Wu, Yanfen Shen, Minghua Luo, Zhengbo Wang, Fei Liu, Xiaoxu Leng, Junjun Hu, Mingyang Yin, Jia Lu, Yingnan Guo, Kai Yang, Jiawei Han, Xu Chen, Yanqing Zhu, Yuxiang Zhao, Xin Liu, Yirong Yang, Ye He, Jiahang Wang, Yang Cai, Tianlin Zhang, Li Gao, Liu Liu, Mingchao Sun, Fan Jiang, Chiyu Wang, Zhicheng Liu, Hongyu Pan, Honglin Han, Zhining Gu, Kuan Yang, Jianfang Zhang, Di Jing, Zihao Guan, Wei Guo, Guoqing Liu, Di Yang, Xiangpo Yang, Menglin Yang, Hongguang Xing, Weiguo Li, Mu Xu
cs.AI
摘要
长期以来,具身导航领域因任务专用架构而处于割裂状态。我们推出ABot-N0——一个统一的视觉-语言-动作基础模型,实现了点目标导航、物体目标导航、指令跟随、兴趣点导航及行人跟随这五大核心任务的"大一统"。该模型采用分层式"大脑-动作"架构,将基于大语言模型的认知大脑(负责语义推理)与基于流匹配的动作专家(生成精确连续轨迹)相结合。
为支撑大规模学习,我们开发了ABot-N0数据引擎,在7,802个高保真3D场景(总面积10.7平方公里)中构建了1,690万条专家轨迹和500万条推理样本。ABot-N0在7项基准测试中均达到最新顶尖性能,显著超越各类专用模型。此外,我们的智能导航系统融合了规划器与分层拓扑记忆机制,可在动态现实环境中执行鲁棒的长时程任务。
English
Embodied navigation has long been fragmented by task-specific architectures. We introduce ABot-N0, a unified Vision-Language-Action (VLA) foundation model that achieves a ``Grand Unification'' across 5 core tasks: Point-Goal, Object-Goal, Instruction-Following, POI-Goal, and Person-Following. ABot-N0 utilizes a hierarchical ``Brain-Action'' architecture, pairing an LLM-based Cognitive Brain for semantic reasoning with a Flow Matching-based Action Expert for precise, continuous trajectory generation.
To support large-scale learning, we developed the ABot-N0 Data Engine, curating 16.9M expert trajectories and 5.0M reasoning samples across 7,802 high-fidelity 3D scenes (10.7 km^2). ABot-N0 achieves new SOTA performance across 7 benchmarks, significantly outperforming specialized models. Furthermore, our Agentic Navigation System integrates a planner with hierarchical topological memory, enabling robust, long-horizon missions in dynamic real-world environments.