DiffSemanticFusion: オンラインHDマップ拡散による自律走行のためのセマンティックラスタBEV融合

要旨

自動運転には、道路形状、交通参加者、およびそれらの意味的関係を含む正確なシーン理解が必要です。オンラインHDマップ生成シナリオでは、ラスターベースの表現は視覚モデルに適していますが、幾何学的精度に欠けます。一方、グラフベースの表現は構造的詳細を保持しますが、正確なマップがないと不安定になります。両者の補完的な強みを活用するため、我々はDiffSemanticFusionを提案します。これは、マルチモーダル軌道予測と計画のための融合フレームワークです。我々のアプローチは、セマンティックラスターフューズドBEV空間上で推論を行い、オンラインHDマップ表現の安定性と表現力を向上させるマップ拡散モジュールによって強化されます。このフレームワークを、軌道予測と計画指向のエンドツーエンド自動運転という2つの下流タスクで検証しました。実世界の自動運転ベンチマークであるnuScenesとNAVSIMでの実験により、いくつかの最先端手法を上回る性能向上が示されました。nuScenesでの予測タスクでは、DiffSemanticFusionをオンラインHDマップを活用したQCNetと統合し、5.1%の性能向上を達成しました。NAVSIMでのエンドツーエンド自動運転では、DiffSemanticFusionが最先端の結果を達成し、NavHardシナリオで15%の性能向上を実現しました。さらに、広範なアブレーションと感度分析により、我々のマップ拡散モジュールが他のベクトルベースのアプローチにシームレスに統合され、性能を向上させることが示されました。すべての成果物はhttps://github.com/SunZhigang7/DiffSemanticFusionで公開されています。

English

Autonomous driving requires accurate scene understanding, including road geometry, traffic agents, and their semantic relationships. In online HD map generation scenarios, raster-based representations are well-suited to vision models but lack geometric precision, while graph-based representations retain structural detail but become unstable without precise maps. To harness the complementary strengths of both, we propose DiffSemanticFusion -- a fusion framework for multimodal trajectory prediction and planning. Our approach reasons over a semantic raster-fused BEV space, enhanced by a map diffusion module that improves both the stability and expressiveness of online HD map representations. We validate our framework on two downstream tasks: trajectory prediction and planning-oriented end-to-end autonomous driving. Experiments on real-world autonomous driving benchmarks, nuScenes and NAVSIM, demonstrate improved performance over several state-of-the-art methods. For the prediction task on nuScenes, we integrate DiffSemanticFusion with the online HD map informed QCNet, achieving a 5.1\% performance improvement. For end-to-end autonomous driving in NAVSIM, DiffSemanticFusion achieves state-of-the-art results, with a 15\% performance gain in NavHard scenarios. In addition, extensive ablation and sensitivity studies show that our map diffusion module can be seamlessly integrated into other vector-based approaches to enhance performance. All artifacts are available at https://github.com/SunZhigang7/DiffSemanticFusion.