DiffSemanticFusion:基於語義柵格BEV融合的自動駕駛技術 通過在線高清地圖擴散實現
DiffSemanticFusion: Semantic Raster BEV Fusion for Autonomous Driving via Online HD Map Diffusion
August 3, 2025
作者: Zhigang Sun, Yiru Wang, Anqing Jiang, Shuo Wang, Yu Gao, Yuwen Heng, Shouyi Zhang, An He, Hao Jiang, Jinhao Chai, Zichong Gu, Wang Jijun, Shichen Tang, Lavdim Halilaj, Juergen Luettin, Hao Sun
cs.AI
摘要
自動駕駛需要精確的場景理解,包括道路幾何、交通參與者及其語義關係。在線高精度地圖生成場景中,基於柵格的表示法適合視覺模型,但缺乏幾何精度,而基於圖的表示法保留了結構細節,但在沒有精確地圖的情況下變得不穩定。為了利用兩者的互補優勢,我們提出了DiffSemanticFusion——一個用於多模態軌跡預測與規劃的融合框架。我們的方法在語義柵格融合的鳥瞰圖(BEV)空間中進行推理,並通過地圖擴散模塊增強,該模塊提高了在線高精度地圖表示的穩定性和表現力。我們在兩個下游任務上驗證了我們的框架:軌跡預測和面向規劃的端到端自動駕駛。在真實世界的自動駕駛基準測試nuScenes和NAVSIM上的實驗表明,相較於多種最先進的方法,我們的框架性能有所提升。對於nuScenes上的預測任務,我們將DiffSemanticFusion與基於在線高精度地圖的QCNet結合,實現了5.1%的性能提升。在NAVSIM的端到端自動駕駛中,DiffSemanticFusion達到了最先進的結果,在NavHard場景中性能提升了15%。此外,廣泛的消融和敏感性研究表明,我們的地圖擴散模塊可以無縫集成到其他基於向量的方法中,以提升性能。所有相關資源可在https://github.com/SunZhigang7/DiffSemanticFusion 獲取。
English
Autonomous driving requires accurate scene understanding, including road
geometry, traffic agents, and their semantic relationships. In online HD map
generation scenarios, raster-based representations are well-suited to vision
models but lack geometric precision, while graph-based representations retain
structural detail but become unstable without precise maps. To harness the
complementary strengths of both, we propose DiffSemanticFusion -- a fusion
framework for multimodal trajectory prediction and planning. Our approach
reasons over a semantic raster-fused BEV space, enhanced by a map diffusion
module that improves both the stability and expressiveness of online HD map
representations. We validate our framework on two downstream tasks: trajectory
prediction and planning-oriented end-to-end autonomous driving. Experiments on
real-world autonomous driving benchmarks, nuScenes and NAVSIM, demonstrate
improved performance over several state-of-the-art methods. For the prediction
task on nuScenes, we integrate DiffSemanticFusion with the online HD map
informed QCNet, achieving a 5.1\% performance improvement. For end-to-end
autonomous driving in NAVSIM, DiffSemanticFusion achieves state-of-the-art
results, with a 15\% performance gain in NavHard scenarios. In addition,
extensive ablation and sensitivity studies show that our map diffusion module
can be seamlessly integrated into other vector-based approaches to enhance
performance. All artifacts are available at
https://github.com/SunZhigang7/DiffSemanticFusion.