DiffSemanticFusion: 온라인 HD 맵 확산을 통한 자율주행을 위한 의미론적 래스터 BEV 융합

초록

자율 주행은 도로 기하학, 교통 요소 및 이들의 의미적 관계를 포함한 정확한 장면 이해를 요구합니다. 온라인 HD 맵 생성 시나리오에서, 래스터 기반 표현은 비전 모델에 적합하지만 기하학적 정밀도가 부족한 반면, 그래프 기반 표현은 구조적 세부 정보를 유지하지만 정확한 맵 없이는 불안정해집니다. 이 두 가지의 상호 보완적 강점을 활용하기 위해, 우리는 다중 모드 궤적 예측 및 계획을 위한 융합 프레임워크인 DiffSemanticFusion을 제안합니다. 우리의 접근 방식은 맵 확산 모듈로 강화된 의미적 래스터 융합 BEV 공간에서 추론하며, 이를 통해 온라인 HD 맵 표현의 안정성과 표현력을 모두 향상시킵니다. 우리는 이 프레임워크를 두 가지 하위 작업에서 검증했습니다: 궤적 예측 및 계획 지향적 종단간 자율 주행. 실제 자율 주행 벤치마크인 nuScenes와 NAVSIM에서의 실험은 여러 최신 방법들보다 향상된 성능을 보여줍니다. nuScenes에서의 예측 작업에서는, DiffSemanticFusion을 온라인 HD 맵 정보를 활용한 QCNet과 통합하여 5.1%의 성능 향상을 달성했습니다. NAVSIM에서의 종단간 자율 주행에서는, DiffSemanticFusion이 최신 결과를 달성하며 NavHard 시나리오에서 15%의 성능 향상을 보였습니다. 또한, 광범위한 절제 및 민감도 연구를 통해 우리의 맵 확산 모듈이 다른 벡터 기반 접근 방식에 원활하게 통합되어 성능을 향상시킬 수 있음을 보여줍니다. 모든 자료는 https://github.com/SunZhigang7/DiffSemanticFusion에서 확인할 수 있습니다.

English

Autonomous driving requires accurate scene understanding, including road geometry, traffic agents, and their semantic relationships. In online HD map generation scenarios, raster-based representations are well-suited to vision models but lack geometric precision, while graph-based representations retain structural detail but become unstable without precise maps. To harness the complementary strengths of both, we propose DiffSemanticFusion -- a fusion framework for multimodal trajectory prediction and planning. Our approach reasons over a semantic raster-fused BEV space, enhanced by a map diffusion module that improves both the stability and expressiveness of online HD map representations. We validate our framework on two downstream tasks: trajectory prediction and planning-oriented end-to-end autonomous driving. Experiments on real-world autonomous driving benchmarks, nuScenes and NAVSIM, demonstrate improved performance over several state-of-the-art methods. For the prediction task on nuScenes, we integrate DiffSemanticFusion with the online HD map informed QCNet, achieving a 5.1\% performance improvement. For end-to-end autonomous driving in NAVSIM, DiffSemanticFusion achieves state-of-the-art results, with a 15\% performance gain in NavHard scenarios. In addition, extensive ablation and sensitivity studies show that our map diffusion module can be seamlessly integrated into other vector-based approaches to enhance performance. All artifacts are available at https://github.com/SunZhigang7/DiffSemanticFusion.

DiffSemanticFusion: 온라인 HD 맵 확산을 통한 자율주행을 위한 의미론적 래스터 BEV 융합

DiffSemanticFusion: Semantic Raster BEV Fusion for Autonomous Driving via Online HD Map Diffusion

초록

Support