FocalFormer3D: 3D物体検出における困難なインスタンスへの焦点化

要旨

3D物体検出における偽陰性（FN）、例えば歩行者、車両、その他の障害物の検出漏れは、自動運転において潜在的に危険な状況を引き起こす可能性があります。この問題は致命的であるにもかかわらず、多くの現在の3D検出手法では十分に研究されていません。本研究では、Hard Instance Probing（HIP）を提案します。これは、FNを多段階的に特定し、モデルが難しいインスタンスを掘り下げることに集中するよう導く一般的なパイプラインです。3D物体検出において、この手法をFocalFormer3Dとして具体化しました。これは、難しい物体を掘り下げ、予測の再現率を向上させることに優れた、シンプルでありながら効果的な検出器です。FocalFormer3Dは、難しい物体を発見するための多段階クエリ生成と、大量の物体候補から効率的に物体を識別するためのボックスレベルトランスフォーマーデコーダを特徴としています。nuScenesおよびWaymoデータセットでの実験結果は、FocalFormer3Dの優れた性能を裏付けています。この利点により、LiDARおよびマルチモーダル設定の両方において、検出と追跡の両方で強力な性能を発揮します。特に、FocalFormer3DはnuScenes検出ベンチマークで70.5 mAPおよび73.9 NDSを達成し、nuScenes追跡ベンチマークでは72.1 AMOTAを記録し、いずれもnuScenes LiDARリーダーボードで1位を獲得しました。私たちのコードはhttps://github.com/NVlabs/FocalFormer3Dで公開されています。

English

False negatives (FN) in 3D object detection, {\em e.g.}, missing predictions of pedestrians, vehicles, or other obstacles, can lead to potentially dangerous situations in autonomous driving. While being fatal, this issue is understudied in many current 3D detection methods. In this work, we propose Hard Instance Probing (HIP), a general pipeline that identifies FN in a multi-stage manner and guides the models to focus on excavating difficult instances. For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall. FocalFormer3D features a multi-stage query generation to discover hard objects and a box-level transformer decoder to efficiently distinguish objects from massive object candidates. Experimental results on the nuScenes and Waymo datasets validate the superior performance of FocalFormer3D. The advantage leads to strong performance on both detection and tracking, in both LiDAR and multi-modal settings. Notably, FocalFormer3D achieves a 70.5 mAP and 73.9 NDS on nuScenes detection benchmark, while the nuScenes tracking benchmark shows 72.1 AMOTA, both ranking 1st place on the nuScenes LiDAR leaderboard. Our code is available at https://github.com/NVlabs/FocalFormer3D.

FocalFormer3D: 3D物体検出における困難なインスタンスへの焦点化

FocalFormer3D : Focusing on Hard Instance for 3D Object Detection

要旨

Support