SAM3D：Segment Anything Modelによるゼロショット3D物体検出

要旨

大規模言語モデルの発展に伴い、ChatGPTのような多くの注目すべき言語システムが繁栄し、多くのタスクで驚異的な成功を収め、基盤モデルの驚異的な力を示しています。視覚タスクにおける基盤モデルの能力を解き放つという精神のもと、画像セグメンテーションのための視覚基盤モデルであるSegment Anything Model（SAM）が最近提案され、多くの下流2Dタスクで強力なゼロショット能力を示しています。しかし、SAMが3D視覚タスク、特に3D物体検出に適応できるかどうかはまだ検討されていません。このインスピレーションを受けて、本論文ではSAMのゼロショット能力を3D物体検出に適応させることを探求します。我々は、SAMを活用したBEV処理パイプラインを提案し、大規模なWaymoオープンデータセットで有望な結果を得ました。初期の試みとして、我々の手法は視覚基盤モデルを用いた3D物体検出に向けて一歩を踏み出し、3D視覚タスクにおけるその力を解き放つ機会を提示します。コードはhttps://github.com/DYZhang09/SAM3Dで公開されています。

English

With the development of large language models, many remarkable linguistic systems like ChatGPT have thrived and achieved astonishing success on many tasks, showing the incredible power of foundation models. In the spirit of unleashing the capability of foundation models on vision tasks, the Segment Anything Model (SAM), a vision foundation model for image segmentation, has been proposed recently and presents strong zero-shot ability on many downstream 2D tasks. However, whether SAM can be adapted to 3D vision tasks has yet to be explored, especially 3D object detection. With this inspiration, we explore adapting the zero-shot ability of SAM to 3D object detection in this paper. We propose a SAM-powered BEV processing pipeline to detect objects and get promising results on the large-scale Waymo open dataset. As an early attempt, our method takes a step toward 3D object detection with vision foundation models and presents the opportunity to unleash their power on 3D vision tasks. The code is released at https://github.com/DYZhang09/SAM3D.

SAM3D：Segment Anything Modelによるゼロショット3D物体検出

SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model

要旨

Support