概念图:用于感知和规划的开放词汇三维场景图
ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning
September 28, 2023
作者: Qiao Gu, Alihusein Kuwajerwala, Sacha Morin, Krishna Murthy Jatavallabhula, Bipasha Sen, Aditya Agarwal, Corban Rivera, William Paul, Kirsty Ellis, Rama Chellappa, Chuang Gan, Celso Miguel de Melo, Joshua B. Tenenbaum, Antonio Torralba, Florian Shkurti, Liam Paull
cs.AI
摘要
为了让机器人执行各种任务,它们需要一个在语义上丰富、同时紧凑高效以支持任务驱动的感知和规划的世界的3D表示。最近的方法尝试利用大型视觉-语言模型的特征来对3D表示中的语义进行编码。然而,这些方法往往会生成具有每点特征向量的地图,在更大的环境中不易扩展,也不包含环境中实体之间的语义空间关系,这对下游规划是有用的。在这项工作中,我们提出了ConceptGraphs,这是一种用于3D场景的开放词汇图结构表示。ConceptGraphs是通过利用2D基础模型并通过多视图关联将它们的输出融合到3D中构建的。由此产生的表示可以泛化到新颖的语义类别,而无需收集大量的3D数据集或微调模型。我们通过一些通过抽象(语言)提示指定并需要对空间和语义概念进行复杂推理的下游规划任务展示了这种表示的实用性。(项目页面:https://concept-graphs.github.io/ 解释视频:https://youtu.be/mRhNkQwRYnc)
English
For robots to perform a wide variety of tasks, they require a 3D
representation of the world that is semantically rich, yet compact and
efficient for task-driven perception and planning. Recent approaches have
attempted to leverage features from large vision-language models to encode
semantics in 3D representations. However, these approaches tend to produce maps
with per-point feature vectors, which do not scale well in larger environments,
nor do they contain semantic spatial relationships between entities in the
environment, which are useful for downstream planning. In this work, we propose
ConceptGraphs, an open-vocabulary graph-structured representation for 3D
scenes. ConceptGraphs is built by leveraging 2D foundation models and fusing
their output to 3D by multi-view association. The resulting representations
generalize to novel semantic classes, without the need to collect large 3D
datasets or finetune models. We demonstrate the utility of this representation
through a number of downstream planning tasks that are specified through
abstract (language) prompts and require complex reasoning over spatial and
semantic concepts. (Project page: https://concept-graphs.github.io/ Explainer
video: https://youtu.be/mRhNkQwRYnc )