SAM3D：通過「Segment Anything Model」實現零樣本3D物體檢測

摘要

隨著大型語言模型的發展，許多顯著的語言系統如ChatGPT已經蓬勃發展並在許多任務上取得驚人的成功，展示了基礎模型的令人難以置信的強大。為了發揮基礎模型在視覺任務上的能力，最近提出了一種名為Segment Anything Model（SAM）的視覺基礎模型，用於圖像分割，在許多下游2D任務上展現出強大的零樣本能力。然而，SAM是否能夠適應3D視覺任務尚未被探索，特別是3D物體檢測。在這個靈感的推動下，我們在本文中探索將SAM的零樣本能力應用於3D物體檢測。我們提出了一個以SAM為動力的BEV處理流程，用於檢測物體並在大規模Waymo開放數據集上取得了令人期待的結果。作為一次早期嘗試，我們的方法邁出了一步，朝著使用視覺基礎模型進行3D物體檢測的方向邁進，並提供了將它們的能力應用於3D視覺任務的機會。代碼已在https://github.com/DYZhang09/SAM3D 上發布。

English

With the development of large language models, many remarkable linguistic systems like ChatGPT have thrived and achieved astonishing success on many tasks, showing the incredible power of foundation models. In the spirit of unleashing the capability of foundation models on vision tasks, the Segment Anything Model (SAM), a vision foundation model for image segmentation, has been proposed recently and presents strong zero-shot ability on many downstream 2D tasks. However, whether SAM can be adapted to 3D vision tasks has yet to be explored, especially 3D object detection. With this inspiration, we explore adapting the zero-shot ability of SAM to 3D object detection in this paper. We propose a SAM-powered BEV processing pipeline to detect objects and get promising results on the large-scale Waymo open dataset. As an early attempt, our method takes a step toward 3D object detection with vision foundation models and presents the opportunity to unleash their power on 3D vision tasks. The code is released at https://github.com/DYZhang09/SAM3D.

SAM3D：通過「Segment Anything Model」實現零樣本3D物體檢測

SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model

摘要

Support