LangMap:面向开放词汇目标导航的分层基准框架
LangMap: A Hierarchical Benchmark for Open-Vocabulary Goal Navigation
February 2, 2026
作者: Bo Miao, Weijia Liu, Jun Luo, Lachlan Shinnick, Jian Liu, Thomas Hamilton-Smith, Yuhe Yang, Zijie Wu, Vanja Videnovic, Feras Dayoub, Anton van den Hengel
cs.AI
摘要
物体与语言之间的关联是实现人机有意义的交流及实用具身智能的基础。我们推出HieraNav——一个多粒度、开放词汇的目标导航任务,智能体通过解析自然语言指令抵达四个语义层级的目标:场景、房间、区域和实例。为此,我们构建了Language as a Map(LangMap)基准数据集,该大规模数据集基于真实三维室内扫描环境,包含涵盖上述层级的全人工校验标注与任务。LangMap提供区域标签、区分性区域描述、覆盖414种物体类别的区分性实例描述,以及超过1.8万个导航任务。每个目标均配备简洁与详细双版本描述,支持不同指令风格的评估。LangMap以仅四分之一词汇量在区分准确率上超越GOAT-Bench达23.8%,实现更优标注质量。基于LangMap的零样本与监督模型综合评估表明:丰富上下文与记忆能提升导航成功率,但长尾分布目标、微小目标、上下文依赖目标、远距离目标及多目标协同完成仍是挑战。HieraNav与LangMap为推进语言驱动具身导航建立了严谨的测试平台。项目地址:https://bo-miao.github.io/LangMap
English
The relationships between objects and language are fundamental to meaningful communication between humans and AI, and to practically useful embodied intelligence. We introduce HieraNav, a multi-granularity, open-vocabulary goal navigation task where agents interpret natural language instructions to reach targets at four semantic levels: scene, room, region, and instance. To this end, we present Language as a Map (LangMap), a large-scale benchmark built on real-world 3D indoor scans with comprehensive human-verified annotations and tasks spanning these levels. LangMap provides region labels, discriminative region descriptions, discriminative instance descriptions covering 414 object categories, and over 18K navigation tasks. Each target features both concise and detailed descriptions, enabling evaluation across different instruction styles. LangMap achieves superior annotation quality, outperforming GOAT-Bench by 23.8% in discriminative accuracy using four times fewer words. Comprehensive evaluations of zero-shot and supervised models on LangMap reveal that richer context and memory improve success, while long-tailed, small, context-dependent, and distant goals, as well as multi-goal completion, remain challenging. HieraNav and LangMap establish a rigorous testbed for advancing language-driven embodied navigation. Project: https://bo-miao.github.io/LangMap