SAM2Point:以零样本与可提示方式实现三维视频分割
SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners
August 29, 2024
作者: Ziyu Guo, Renrui Zhang, Xiangyang Zhu, Chengzhuo Tong, Peng Gao, Chunyuan Li, Pheng-Ann Heng
cs.AI
摘要
我们推出SAM2Point,这是将分段任意模型2(SAM 2)适配于零样本可提示三维分割的初步探索。该框架将任意三维数据解析为多向视频序列,无需额外训练或2D-3D投影即可实现三维空间分割。我们的方法支持多种提示类型(包括三维点、边界框和掩码),并能泛化至多样场景,如三维物体、室内场景、户外环境及原始稀疏激光雷达数据。在Objaverse、S3DIS、ScanNet、Semantic3D和KITTI等多个三维数据集上的实验表明,SAM2Point具有强大的泛化能力。据我们所知,这是当前对SAM三维实现最忠实的复现,有望为可提示三维分割的未来研究提供基础。在线演示:https://huggingface.co/spaces/ZiyuG/SAM2Point 代码仓库:https://github.com/ZiyuGuo99/SAM2Point
English
We introduce SAM2Point, a preliminary exploration adapting Segment Anything
Model 2 (SAM 2) for zero-shot and promptable 3D segmentation. SAM2Point
interprets any 3D data as a series of multi-directional videos, and leverages
SAM 2 for 3D-space segmentation, without further training or 2D-3D projection.
Our framework supports various prompt types, including 3D points, boxes, and
masks, and can generalize across diverse scenarios, such as 3D objects, indoor
scenes, outdoor environments, and raw sparse LiDAR. Demonstrations on multiple
3D datasets, e.g., Objaverse, S3DIS, ScanNet, Semantic3D, and KITTI, highlight
the robust generalization capabilities of SAM2Point. To our best knowledge, we
present the most faithful implementation of SAM in 3D, which may serve as a
starting point for future research in promptable 3D segmentation. Online Demo:
https://huggingface.co/spaces/ZiyuG/SAM2Point . Code:
https://github.com/ZiyuGuo99/SAM2Point .