SAM2Point:以零樣本和提示方式將任何3D物體分割為視頻
SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners
August 29, 2024
作者: Ziyu Guo, Renrui Zhang, Xiangyang Zhu, Chengzhuo Tong, Peng Gao, Chunyuan Li, Pheng-Ann Heng
cs.AI
摘要
我們介紹 SAM2Point,這是對於零樣本和可提示的三維分割進行初步探索,將 Segment Anything Model 2 (SAM 2) 進行適應。SAM2Point將任何三維數據解釋為一系列多方向視頻,並利用SAM 2進行三維空間分割,無需進行進一步的訓練或二維至三維投影。我們的框架支持各種提示類型,包括三維點、框和遮罩,並且可以在各種場景中進行泛化,例如三維物體、室內場景、室外環境和原始稀疏的LiDAR。在多個三維數據集上的演示,例如Objaverse、S3DIS、ScanNet、Semantic3D和KITTI,突顯了SAM2Point的強大泛化能力。據我們所知,我們提出了在三維中最忠實的SAM實現,這可能成為未來可提示的三維分割研究的起點。在線演示:https://huggingface.co/spaces/ZiyuG/SAM2Point。代碼:https://github.com/ZiyuGuo99/SAM2Point。
English
We introduce SAM2Point, a preliminary exploration adapting Segment Anything
Model 2 (SAM 2) for zero-shot and promptable 3D segmentation. SAM2Point
interprets any 3D data as a series of multi-directional videos, and leverages
SAM 2 for 3D-space segmentation, without further training or 2D-3D projection.
Our framework supports various prompt types, including 3D points, boxes, and
masks, and can generalize across diverse scenarios, such as 3D objects, indoor
scenes, outdoor environments, and raw sparse LiDAR. Demonstrations on multiple
3D datasets, e.g., Objaverse, S3DIS, ScanNet, Semantic3D, and KITTI, highlight
the robust generalization capabilities of SAM2Point. To our best knowledge, we
present the most faithful implementation of SAM in 3D, which may serve as a
starting point for future research in promptable 3D segmentation. Online Demo:
https://huggingface.co/spaces/ZiyuG/SAM2Point . Code:
https://github.com/ZiyuGuo99/SAM2Point .