自動運転における3次元占有予測のための拡散ベース生成モデル

要旨

視覚入力から3D占有グリッドを正確に予測することは自動運転において極めて重要ですが、現在の識別モデルベースの手法は、ノイズの多いデータ、不完全な観測、および3Dシーンに内在する複雑な構造に苦戦しています。本研究では、3D占有予測を拡散モデルを用いた生成モデリングタスクとして再定義します。このアプローチでは、基礎となるデータ分布を学習し、3Dシーンの事前知識を組み込むことで、予測の一貫性とノイズ耐性を向上させ、3D空間構造の複雑さをより適切に扱います。大規模な実験により、拡散モデルベースの生成モデルが最先端の識別モデルアプローチを上回り、特に遮蔽された領域や視認性の低い領域において、より現実的で正確な占有予測を実現することが示されました。さらに、改善された予測は下流の経路計画タスクに大きな利益をもたらし、実世界の自動運転アプリケーションにおける本手法の実用的な優位性が明らかになりました。

English

Accurately predicting 3D occupancy grids from visual inputs is critical for autonomous driving, but current discriminative methods struggle with noisy data, incomplete observations, and the complex structures inherent in 3D scenes. In this work, we reframe 3D occupancy prediction as a generative modeling task using diffusion models, which learn the underlying data distribution and incorporate 3D scene priors. This approach enhances prediction consistency, noise robustness, and better handles the intricacies of 3D spatial structures. Our extensive experiments show that diffusion-based generative models outperform state-of-the-art discriminative approaches, delivering more realistic and accurate occupancy predictions, especially in occluded or low-visibility regions. Moreover, the improved predictions significantly benefit downstream planning tasks, highlighting the practical advantages of our method for real-world autonomous driving applications.

自動運転における3次元占有予測のための拡散ベース生成モデル

Diffusion-Based Generative Models for 3D Occupancy Prediction in Autonomous Driving

要旨

Support