OpenMask3D:开放词汇的3D实例分割
OpenMask3D: Open-Vocabulary 3D Instance Segmentation
June 23, 2023
作者: Ayça Takmaz, Elisabetta Fedele, Robert W. Sumner, Marc Pollefeys, Federico Tombari, Francis Engelmann
cs.AI
摘要
我们介绍了开放词汇的3D实例分割任务。传统的3D实例分割方法主要依赖于现有的3D标注数据集,这些数据集仅限于一组封闭的物体类别。这对于现实生活中可能需要执行由涉及各种物体的新颖、开放词汇查询指导的任务来说是一个重要的限制。最近,出现了开放词汇的3D场景理解方法,以解决这个问题,通过学习每个场景点的可查询特征。虽然这种表示形式可以直接用于执行语义分割,但现有方法在识别物体实例方面存在局限性。在这项工作中,我们解决了这个限制,并提出了OpenMask3D,这是一种用于开放词汇3D实例分割的零样本方法。在预测的类别无关3D实例掩码的指导下,我们的模型通过基于CLIP的图像嵌入的多视图融合来聚合每个掩码特征。我们在ScanNet200数据集上进行实验和消融研究,评估了OpenMask3D的性能,并提供了关于开放词汇3D实例分割任务的见解。我们展示了我们的方法在长尾分布上优于其他开放词汱对应方法。此外,OpenMask3D超越了封闭词汇方法的限制,实现了基于描述物体属性(如语义、几何、功能和材料属性)的自由形式查询的物体实例分割。
English
We introduce the task of open-vocabulary 3D instance segmentation.
Traditional approaches for 3D instance segmentation largely rely on existing 3D
annotated datasets, which are restricted to a closed-set of object categories.
This is an important limitation for real-life applications where one might need
to perform tasks guided by novel, open-vocabulary queries related to objects
from a wide variety. Recently, open-vocabulary 3D scene understanding methods
have emerged to address this problem by learning queryable features per each
point in the scene. While such a representation can be directly employed to
perform semantic segmentation, existing methods have limitations in their
ability to identify object instances. In this work, we address this limitation,
and propose OpenMask3D, which is a zero-shot approach for open-vocabulary 3D
instance segmentation. Guided by predicted class-agnostic 3D instance masks,
our model aggregates per-mask features via multi-view fusion of CLIP-based
image embeddings. We conduct experiments and ablation studies on the ScanNet200
dataset to evaluate the performance of OpenMask3D, and provide insights about
the open-vocabulary 3D instance segmentation task. We show that our approach
outperforms other open-vocabulary counterparts, particularly on the long-tail
distribution. Furthermore, OpenMask3D goes beyond the limitations of
close-vocabulary approaches, and enables the segmentation of object instances
based on free-form queries describing object properties such as semantics,
geometry, affordances, and material properties.