OpenMask3D：開放詞彙的3D實例分割

摘要

我們介紹了開放詞彙的3D實例分割任務。傳統的3D實例分割方法主要依賴現有的3D標註數據集，這些數據集僅限於一組封閉的物體類別。這對於現實應用是一個重要的限制，因為在這些應用中，人們可能需要根據與各種物體相關的新穎、開放詞彙的查詢來執行任務。最近，出現了開放詞彙的3D場景理解方法，以解決這個問題，通過學習每個場景中每個點的可查詢特徵。儘管這種表示形式可以直接用於執行語義分割，但現有方法在識別物體實例方面存在限制。在這項工作中，我們解決了這個限制，提出了OpenMask3D，這是一種用於開放詞彙3D實例分割的零樣本方法。通過預測的類別無關3D實例遮罩，我們的模型通過基於CLIP的圖像嵌入的多視圖融合來聚合每個遮罩的特徵。我們在ScanNet200數據集上進行實驗和消融研究，評估了OpenMask3D的性能，並提供了關於開放詞彙3D實例分割任務的見解。我們展示了我們的方法在長尾分佈上優於其他開放詞彙對應方法。此外，OpenMask3D超越了封閉詞彙方法的限制，並且能夠基於描述物體屬性的自由形式查詢來分割物體實例，例如語義、幾何、可負擔性和材料特性。

English

We introduce the task of open-vocabulary 3D instance segmentation. Traditional approaches for 3D instance segmentation largely rely on existing 3D annotated datasets, which are restricted to a closed-set of object categories. This is an important limitation for real-life applications where one might need to perform tasks guided by novel, open-vocabulary queries related to objects from a wide variety. Recently, open-vocabulary 3D scene understanding methods have emerged to address this problem by learning queryable features per each point in the scene. While such a representation can be directly employed to perform semantic segmentation, existing methods have limitations in their ability to identify object instances. In this work, we address this limitation, and propose OpenMask3D, which is a zero-shot approach for open-vocabulary 3D instance segmentation. Guided by predicted class-agnostic 3D instance masks, our model aggregates per-mask features via multi-view fusion of CLIP-based image embeddings. We conduct experiments and ablation studies on the ScanNet200 dataset to evaluate the performance of OpenMask3D, and provide insights about the open-vocabulary 3D instance segmentation task. We show that our approach outperforms other open-vocabulary counterparts, particularly on the long-tail distribution. Furthermore, OpenMask3D goes beyond the limitations of close-vocabulary approaches, and enables the segmentation of object instances based on free-form queries describing object properties such as semantics, geometry, affordances, and material properties.

OpenMask3D：開放詞彙的3D實例分割

OpenMask3D: Open-Vocabulary 3D Instance Segmentation

摘要

Support