SAM3D：零样本3D物体检测通过Segment Anything模型

摘要

随着大型语言模型的发展，许多引人瞩目的语言系统如ChatGPT蓬勃发展，并在许多任务上取得了惊人的成功，展示了基础模型的令人难以置信的强大能力。为了释放基础模型在视觉任务上的潜力，最近提出了一种名为Segment Anything Model（SAM）的视觉基础模型，用于图像分割，并在许多下游2D任务上展现出强大的零样本能力。然而，SAM能否适应3D视觉任务尚未被探索，特别是3D目标检测。受此启发，我们在本文中探讨了将SAM的零样本能力应用于3D目标检测。我们提出了一个以SAM为动力的BEV处理流程，用于检测物体，并在大规模Waymo开放数据集上取得了令人期待的结果。作为一次早期尝试，我们的方法迈出了在视觉基础模型下进行3D目标检测的一步，并提供了释放它们在3D视觉任务上能力的机会。代码已发布在https://github.com/DYZhang09/SAM3D。

English

With the development of large language models, many remarkable linguistic systems like ChatGPT have thrived and achieved astonishing success on many tasks, showing the incredible power of foundation models. In the spirit of unleashing the capability of foundation models on vision tasks, the Segment Anything Model (SAM), a vision foundation model for image segmentation, has been proposed recently and presents strong zero-shot ability on many downstream 2D tasks. However, whether SAM can be adapted to 3D vision tasks has yet to be explored, especially 3D object detection. With this inspiration, we explore adapting the zero-shot ability of SAM to 3D object detection in this paper. We propose a SAM-powered BEV processing pipeline to detect objects and get promising results on the large-scale Waymo open dataset. As an early attempt, our method takes a step toward 3D object detection with vision foundation models and presents the opportunity to unleash their power on 3D vision tasks. The code is released at https://github.com/DYZhang09/SAM3D.

SAM3D：零样本3D物体检测通过Segment Anything模型

SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model

摘要

Support