Sensor2Sensor：面向自动驾驶的跨具身传感器转换

摘要

自动驾驶系统（ADS）的稳健训练与验证需要海量、多样化的数据集。由自动驾驶车辆（AV）车队采集的专有数据虽具有高保真度，但其规模、传感器配置的多样性、地理覆盖范围以及长尾行为场景的覆盖均存在局限。相比之下，来自行车记录仪等公开渠道的野外数据具有庞大的规模和丰富的多样性，能够捕获关键的长尾场景和新环境。然而，这类非结构化的野外视频数据无法直接用于需要结构化多模态传感器输入的ADS验证与训练。为弥合这一数据鸿沟，我们提出Sensor2Sensor——一种新颖的生成式建模范式，可将野外单目行车记录仪视频转化为高保真度的多模态传感器套件（AV日志），其中包含多视角相机图像与激光雷达点云。其核心挑战在于缺少成对训练数据。我们通过4D高斯泼溅（4DGS）重建与新颖视角渲染技术，将真实AV日志转换为行车记录仪风格的视频，从而解决该问题。Sensor2Sensor随后采用扩散架构执行生成式转换。我们基于生成传感器数据的保真度与真实感进行了全面的定量评估。通过将具有挑战性的野外互联网视频和行车记录仪图像转化为逼真的多模态数据格式，我们展示了Sensor2Sensor的实际应用价值，进一步为自动驾驶开发解锁了庞大的外部数据源。

English

Robust training and validation of Autonomous Driving Systems (ADS) require massive, diverse datasets. Proprietary data collected by Autonomous Vehicle (AV) fleets, while high-fidelity, are limited in scale, diversity of sensor configurations, as well as geographic and long-tail-behavioral coverage. In contrast, in-the-wild data from sources like dashcams offers immense scale and diversity, capturing critical long-tail scenarios and novel environments. However, this unstructured, in-the-wild video data is incompatible with ADS expecting structured, multi-modal sensor inputs for validation and training. To bridge this data gap, we propose Sensor2Sensor, a novel generative modeling paradigm that translates in-the-wild monocular dashcam videos into a high-fidelity, multi-modal sensor suite (AV logs) comprising multi-view camera images and LiDAR point clouds. A core challenge is the lack of paired training data. We address this by converting real AV logs into dashcam-style videos via 4D Gaussian Splatting (4DGS) reconstruction and novel-view rendering. Sensor2Sensor then utilizes a diffusion architecture to perform the generative conversion. We perform comprehensive quantitative evaluations on the fidelity and realism of the generated sensor data. We demonstrate Sensor2Sensor's practical utility by converting challenging in-the-wild internet and dashcam footage into realistic, multi-modal data formats, further unlocking vast external data sources for AV development.