DiffSemanticFusion: Semantische Raster BEV-fusie voor autonoom rijden via online HD Map-diffusie

Samenvatting

Autonoom rijden vereist een nauwkeurig begrip van de omgeving, inclusief weggeometrie, verkeersdeelnemers en hun semantische relaties. In scenario's voor online HD-mapgeneratie zijn rastergebaseerde representaties goed geschikt voor visionmodellen, maar missen ze geometrische precisie, terwijl grafiekgebaseerde representaties structurele details behouden maar instabiel worden zonder precieze kaarten. Om de complementaire sterke punten van beide te benutten, stellen we DiffSemanticFusion voor -- een fusiekader voor multimodale trajectvoorspelling en planning. Onze aanpak redeneert over een semantisch rastergefuseerde BEV-ruimte, versterkt door een mapdiffusiemodule die zowel de stabiliteit als de expressiviteit van online HD-maprepresentaties verbetert. We valideren ons kader op twee downstreamtaken: trajectvoorspelling en planning-georiënteerd end-to-end autonoom rijden. Experimenten op real-world benchmarks voor autonoom rijden, nuScenes en NAVSIM, tonen verbeterde prestaties aan ten opzichte van verschillende state-of-the-art methoden. Voor de voorspellingstaak op nuScenes integreren we DiffSemanticFusion met de online HD-map-geïnformeerde QCNet, wat een prestatieverbetering van 5,1\% oplevert. Voor end-to-end autonoom rijden in NAVSIM behaalt DiffSemanticFusion state-of-the-art resultaten, met een prestatieverbetering van 15\% in NavHard-scenario's. Daarnaast tonen uitgebreide ablatie- en gevoeligheidsstudies aan dat onze mapdiffusiemodule naadloos kan worden geïntegreerd in andere vectorgebaseerde benaderingen om de prestaties te verbeteren. Alle artefacten zijn beschikbaar op https://github.com/SunZhigang7/DiffSemanticFusion.

English

Autonomous driving requires accurate scene understanding, including road geometry, traffic agents, and their semantic relationships. In online HD map generation scenarios, raster-based representations are well-suited to vision models but lack geometric precision, while graph-based representations retain structural detail but become unstable without precise maps. To harness the complementary strengths of both, we propose DiffSemanticFusion -- a fusion framework for multimodal trajectory prediction and planning. Our approach reasons over a semantic raster-fused BEV space, enhanced by a map diffusion module that improves both the stability and expressiveness of online HD map representations. We validate our framework on two downstream tasks: trajectory prediction and planning-oriented end-to-end autonomous driving. Experiments on real-world autonomous driving benchmarks, nuScenes and NAVSIM, demonstrate improved performance over several state-of-the-art methods. For the prediction task on nuScenes, we integrate DiffSemanticFusion with the online HD map informed QCNet, achieving a 5.1\% performance improvement. For end-to-end autonomous driving in NAVSIM, DiffSemanticFusion achieves state-of-the-art results, with a 15\% performance gain in NavHard scenarios. In addition, extensive ablation and sensitivity studies show that our map diffusion module can be seamlessly integrated into other vector-based approaches to enhance performance. All artifacts are available at https://github.com/SunZhigang7/DiffSemanticFusion.

DiffSemanticFusion: Semantische Raster BEV-fusie voor autonoom rijden via online HD Map-diffusie

DiffSemanticFusion: Semantic Raster BEV Fusion for Autonomous Driving via Online HD Map Diffusion

Samenvatting

Support