ChatPaper.aiChatPaper

LangMap:面向开放词汇目标导航的分层基准

LangMap: A Hierarchical Benchmark for Open-Vocabulary Goal Navigation

February 2, 2026
作者: Bo Miao, Weijia Liu, Jun Luo, Lachlan Shinnick, Jian Liu, Thomas Hamilton-Smith, Yuhe Yang, Zijie Wu, Vanja Videnovic, Feras Dayoub, Anton van den Hengel
cs.AI

摘要

物体与语言的关系是实现人机有意义交互及实用具身智能的核心。我们提出HieraNav——一个多粒度、开放词汇的目标导航任务,要求智能体通过自然语言指令理解四个语义层级的目标:场景、房间、区域和实例。为此,我们构建了LangMap大规模基准数据集,该数据集基于真实世界3D室内扫描数据,包含经人工校验的全方位标注及跨层级任务。LangMap提供区域标签、区分性区域描述、覆盖414种物体类别的区分性实例描述,以及超过1.8万个导航任务。每个目标均配备简洁与详细双版本描述,支持不同指令风格的评估。LangMap以仅四分之一文本长度实现23.8%的判别准确度提升,显著优于GOAT-Bench。基于LangMap对零样本和监督模型的综合评估表明:丰富上下文与记忆可提升导航成功率,但长尾分布、微小目标、上下文依赖、远距离目标及多目标完成仍是挑战。HieraNav与LangMap为推进语言驱动具身导航建立了严谨测试平台。项目地址:https://bo-miao.github.io/LangMap
English
The relationships between objects and language are fundamental to meaningful communication between humans and AI, and to practically useful embodied intelligence. We introduce HieraNav, a multi-granularity, open-vocabulary goal navigation task where agents interpret natural language instructions to reach targets at four semantic levels: scene, room, region, and instance. To this end, we present Language as a Map (LangMap), a large-scale benchmark built on real-world 3D indoor scans with comprehensive human-verified annotations and tasks spanning these levels. LangMap provides region labels, discriminative region descriptions, discriminative instance descriptions covering 414 object categories, and over 18K navigation tasks. Each target features both concise and detailed descriptions, enabling evaluation across different instruction styles. LangMap achieves superior annotation quality, outperforming GOAT-Bench by 23.8% in discriminative accuracy using four times fewer words. Comprehensive evaluations of zero-shot and supervised models on LangMap reveal that richer context and memory improve success, while long-tailed, small, context-dependent, and distant goals, as well as multi-goal completion, remain challenging. HieraNav and LangMap establish a rigorous testbed for advancing language-driven embodied navigation. Project: https://bo-miao.github.io/LangMap
PDF11February 5, 2026