Sensor2Sensor：面向自動駕駛的跨本體感測器轉換

摘要

自動駕駛系統（ADS）的穩健訓練與驗證需要大量且多樣化的數據集。由自動駕駛車隊收集的專有數據雖然保真度高，但在規模、感測器配置的多樣性、地理覆蓋範圍以及長尾行為場景的涵蓋上均有所限制。相比之下，來自行車記錄器等來源的野外數據則具備極大的規模與多樣性，能捕捉關鍵的長尾場景與新環境。然而，這類非結構化的野外影像數據，與預期接收結構化多模態感測器輸入以進行驗證與訓練的自動駕駛系統並不相容。為了解決此數據缺口，我們提出 Sensor2Sensor，一種新穎的生成式建模典範，可將野外的單眼行車記錄器影片轉換為高保真度的多模態感測器套件（自動駕駛車輛日誌），包含多視角相機影像與光達點雲。核心挑戰在於缺乏配對的訓練數據。我們透過 4D 高斯潑濺（4DGS）重建與新視角渲染，將真實的自動駕駛車輛日誌轉換為行車記錄器風格的影片來解決此問題。Sensor2Sensor 隨後利用擴散架構執行生成式轉換。我們對所生成感測器數據的保真度與真實性進行了全面的量化評估。透過將具挑戰性的野外網路與行車記錄器影像轉換為逼真的多模態數據格式，我們展示了 Sensor2Sensor 的實用價值，進一步為自動駕駛車輛的開發解鎖了龐大的外部數據來源。

English

Robust training and validation of Autonomous Driving Systems (ADS) require massive, diverse datasets. Proprietary data collected by Autonomous Vehicle (AV) fleets, while high-fidelity, are limited in scale, diversity of sensor configurations, as well as geographic and long-tail-behavioral coverage. In contrast, in-the-wild data from sources like dashcams offers immense scale and diversity, capturing critical long-tail scenarios and novel environments. However, this unstructured, in-the-wild video data is incompatible with ADS expecting structured, multi-modal sensor inputs for validation and training. To bridge this data gap, we propose Sensor2Sensor, a novel generative modeling paradigm that translates in-the-wild monocular dashcam videos into a high-fidelity, multi-modal sensor suite (AV logs) comprising multi-view camera images and LiDAR point clouds. A core challenge is the lack of paired training data. We address this by converting real AV logs into dashcam-style videos via 4D Gaussian Splatting (4DGS) reconstruction and novel-view rendering. Sensor2Sensor then utilizes a diffusion architecture to perform the generative conversion. We perform comprehensive quantitative evaluations on the fidelity and realism of the generated sensor data. We demonstrate Sensor2Sensor's practical utility by converting challenging in-the-wild internet and dashcam footage into realistic, multi-modal data formats, further unlocking vast external data sources for AV development.