SyntheOcc：通過3D語義MPIs合成幾何控制的街景圖像

摘要

自動駕駛技術的進步越來越依賴高質量的標註數據集，特別是在3D佔有預測任務中，佔有標籤需要密集的3D標註，需要大量人力。在本文中，我們提出了SyntheOcc，這是一種表示擴散模型，通過條件化駕駛場景中的佔有標籤，合成照片逼真且幾何可控的圖像。這為訓練感知模型和模擬等應用提供了無限量的多樣化、標註和可控數據集。SyntheOcc解決了一個關鍵挑戰，即如何將3D幾何信息有效編碼為2D擴散模型的條件輸入。我們的方法創新地將3D語義多平面圖像（MPIs）納入，以提供全面且空間對齊的3D場景描述進行條件化。因此，SyntheOcc能夠生成與給定幾何標籤（3D體素空間中的語義）忠實對齊的照片逼真的多視圖圖像和視頻。對nuScenes數據集上對SyntheOcc進行的廣泛定性和定量評估證明了它在生成可控佔有數據集方面的有效性，這對感知模型是一種有效的數據擴增。

English

The advancement of autonomous driving is increasingly reliant on high-quality annotated datasets, especially in the task of 3D occupancy prediction, where the occupancy labels require dense 3D annotation with significant human effort. In this paper, we propose SyntheOcc, which denotes a diffusion model that Synthesize photorealistic and geometric-controlled images by conditioning Occupancy labels in driving scenarios. This yields an unlimited amount of diverse, annotated, and controllable datasets for applications like training perception models and simulation. SyntheOcc addresses the critical challenge of how to efficiently encode 3D geometric information as conditional input to a 2D diffusion model. Our approach innovatively incorporates 3D semantic multi-plane images (MPIs) to provide comprehensive and spatially aligned 3D scene descriptions for conditioning. As a result, SyntheOcc can generate photorealistic multi-view images and videos that faithfully align with the given geometric labels (semantics in 3D voxel space). Extensive qualitative and quantitative evaluations of SyntheOcc on the nuScenes dataset prove its effectiveness in generating controllable occupancy datasets that serve as an effective data augmentation to perception models.

SyntheOcc：通過3D語義MPIs合成幾何控制的街景圖像

SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs

摘要

Support