将驾驶世界模型重新构想为感知任务的合成数据生成器
Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks
October 22, 2025
作者: Kai Zeng, Zhanqian Wu, Kaixin Xiong, Xiaobao Wei, Xiangyu Guo, Zhenxin Zhu, Kalok Ho, Lijun Zhou, Bohan Zeng, Ming Lu, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Wentao Zhang
cs.AI
摘要
近期驾驶世界模型的技术突破,已能实现高质量RGB视频或多模态视频的可控生成。现有方法主要关注生成质量与可控性相关指标,却普遍忽视了对自动驾驶性能至关重要的下游感知任务评估。传统训练策略通常先在合成数据上预训练、再于真实数据上微调,导致训练周期达到基线方法(仅使用真实数据)的两倍。当我们将基线方法的训练周期加倍时,合成数据的优势便微乎其微。为充分验证合成数据的价值,我们提出了Dream4Drive——一个专为增强下游感知任务而设计的新型合成数据生成框架。该框架先将输入视频解构为多个3D感知引导图,随后将3D资源渲染至这些引导图,最后通过微调驾驶世界模型生成可用于训练下游感知模型的多视角逼真编辑视频。Dream4Drive实现了大规模生成多视角边缘案例的前所未有的灵活性,显著提升了自动驾驶中的边缘场景感知能力。为促进后续研究,我们还开源了名为DriveObj3D的大规模3D资源数据集,涵盖典型驾驶场景类别并支持多样化3D感知视频编辑。综合实验表明,Dream4Drive能在不同训练周期下有效提升下游感知模型的性能。
项目页面:https://wm-research.github.io/Dream4Drive/
代码仓库:https://github.com/wm-research/Dream4Drive
English
Recent advancements in driving world models enable controllable generation of
high-quality RGB videos or multimodal videos. Existing methods primarily focus
on metrics related to generation quality and controllability. However, they
often overlook the evaluation of downstream perception tasks, which are
really crucial for the performance of autonomous driving. Existing
methods usually leverage a training strategy that first pretrains on synthetic
data and finetunes on real data, resulting in twice the epochs compared to the
baseline (real data only). When we double the epochs in the baseline, the
benefit of synthetic data becomes negligible. To thoroughly demonstrate the
benefit of synthetic data, we introduce Dream4Drive, a novel synthetic data
generation framework designed for enhancing the downstream perception tasks.
Dream4Drive first decomposes the input video into several 3D-aware guidance
maps and subsequently renders the 3D assets onto these guidance maps. Finally,
the driving world model is fine-tuned to produce the edited, multi-view
photorealistic videos, which can be used to train the downstream perception
models. Dream4Drive enables unprecedented flexibility in generating multi-view
corner cases at scale, significantly boosting corner case perception in
autonomous driving. To facilitate future research, we also contribute a
large-scale 3D asset dataset named DriveObj3D, covering the typical categories
in driving scenarios and enabling diverse 3D-aware video editing. We conduct
comprehensive experiments to show that Dream4Drive can effectively boost the
performance of downstream perception models under various training epochs.
Page: https://wm-research.github.io/Dream4Drive/ GitHub Link:
https://github.com/wm-research/Dream4Drive