DiffSemanticFusion:基于在线高精地图扩散的语义栅格BEV融合自动驾驶技术
DiffSemanticFusion: Semantic Raster BEV Fusion for Autonomous Driving via Online HD Map Diffusion
August 3, 2025
作者: Zhigang Sun, Yiru Wang, Anqing Jiang, Shuo Wang, Yu Gao, Yuwen Heng, Shouyi Zhang, An He, Hao Jiang, Jinhao Chai, Zichong Gu, Wang Jijun, Shichen Tang, Lavdim Halilaj, Juergen Luettin, Hao Sun
cs.AI
摘要
自动驾驶需要精确的场景理解,包括道路几何、交通参与者及其语义关系。在在线高精地图生成场景中,基于栅格的表示方法虽适合视觉模型,但几何精度不足;而基于图的表示虽保留了结构细节,却因缺乏精确地图而变得不稳定。为融合两者的优势,我们提出了DiffSemanticFusion——一个多模态轨迹预测与规划的融合框架。该方法在语义栅格融合的鸟瞰图(BEV)空间中进行推理,并通过地图扩散模块增强,提升了在线高精地图表示的稳定性和表现力。我们在两个下游任务上验证了该框架:轨迹预测和面向规划的端到端自动驾驶。在真实世界自动驾驶基准测试nuScenes和NAVSIM上的实验表明,相较于多种最先进方法,我们的框架性能显著提升。在nuScenes的预测任务中,我们将DiffSemanticFusion与在线高精地图信息融合的QCNet结合,实现了5.1%的性能提升。在NAVSIM的端到端自动驾驶任务中,DiffSemanticFusion取得了最先进的结果,在NavHard场景下性能提升了15%。此外,广泛的消融实验和敏感性研究显示,我们的地图扩散模块可无缝集成到其他基于矢量的方法中,以增强性能。所有相关资源可在https://github.com/SunZhigang7/DiffSemanticFusion获取。
English
Autonomous driving requires accurate scene understanding, including road
geometry, traffic agents, and their semantic relationships. In online HD map
generation scenarios, raster-based representations are well-suited to vision
models but lack geometric precision, while graph-based representations retain
structural detail but become unstable without precise maps. To harness the
complementary strengths of both, we propose DiffSemanticFusion -- a fusion
framework for multimodal trajectory prediction and planning. Our approach
reasons over a semantic raster-fused BEV space, enhanced by a map diffusion
module that improves both the stability and expressiveness of online HD map
representations. We validate our framework on two downstream tasks: trajectory
prediction and planning-oriented end-to-end autonomous driving. Experiments on
real-world autonomous driving benchmarks, nuScenes and NAVSIM, demonstrate
improved performance over several state-of-the-art methods. For the prediction
task on nuScenes, we integrate DiffSemanticFusion with the online HD map
informed QCNet, achieving a 5.1\% performance improvement. For end-to-end
autonomous driving in NAVSIM, DiffSemanticFusion achieves state-of-the-art
results, with a 15\% performance gain in NavHard scenarios. In addition,
extensive ablation and sensitivity studies show that our map diffusion module
can be seamlessly integrated into other vector-based approaches to enhance
performance. All artifacts are available at
https://github.com/SunZhigang7/DiffSemanticFusion.