ChatPaper.aiChatPaper

重新审视驾驶世界模型:作为感知任务的合成数据生成器

Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks

October 22, 2025
作者: Kai Zeng, Zhanqian Wu, Kaixin Xiong, Xiaobao Wei, Xiangyu Guo, Zhenxin Zhu, Kalok Ho, Lijun Zhou, Bohan Zeng, Ming Lu, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Wentao Zhang
cs.AI

摘要

近期驾驶世界模型的技术进展实现了高质量RGB视频或多模态视频的可控生成。现有方法主要关注生成质量与可控性相关指标,却普遍忽视了对自动驾驶性能至关重要的下游感知任务的评估。当前主流训练策略通常先在合成数据上预训练、再于真实数据上微调,其训练周期达到基线方法(仅使用真实数据)的两倍。若将基线方法的训练周期同样加倍,合成数据的优势便微乎其微。为系统验证合成数据的价值,我们提出Dream4Drive——一个专为增强下游感知任务设计的新型合成数据生成框架。该框架先将输入视频解耦为多张3D感知引导图,随后将3D资源渲染至这些引导图,最终通过微调驾驶世界模型生成可用于训练下游感知模型的编辑后多视角逼真视频。Dream4Drive实现了大规模生成多视角极端场景的突破性灵活度,显著提升自动驾驶中的极端场景感知能力。为促进后续研究,我们还开源了涵盖典型驾驶场景类别的大规模3D资源数据集DriveObj3D,支持多样化的3D感知视频编辑。综合实验表明,在不同训练周期下,Dream4Drive均能有效提升下游感知模型的性能。 项目主页:https://wm-research.github.io/Dream4Drive/ 代码仓库:https://github.com/wm-research/Dream4Drive
English
Recent advancements in driving world models enable controllable generation of high-quality RGB videos or multimodal videos. Existing methods primarily focus on metrics related to generation quality and controllability. However, they often overlook the evaluation of downstream perception tasks, which are really crucial for the performance of autonomous driving. Existing methods usually leverage a training strategy that first pretrains on synthetic data and finetunes on real data, resulting in twice the epochs compared to the baseline (real data only). When we double the epochs in the baseline, the benefit of synthetic data becomes negligible. To thoroughly demonstrate the benefit of synthetic data, we introduce Dream4Drive, a novel synthetic data generation framework designed for enhancing the downstream perception tasks. Dream4Drive first decomposes the input video into several 3D-aware guidance maps and subsequently renders the 3D assets onto these guidance maps. Finally, the driving world model is fine-tuned to produce the edited, multi-view photorealistic videos, which can be used to train the downstream perception models. Dream4Drive enables unprecedented flexibility in generating multi-view corner cases at scale, significantly boosting corner case perception in autonomous driving. To facilitate future research, we also contribute a large-scale 3D asset dataset named DriveObj3D, covering the typical categories in driving scenarios and enabling diverse 3D-aware video editing. We conduct comprehensive experiments to show that Dream4Drive can effectively boost the performance of downstream perception models under various training epochs. Page: https://wm-research.github.io/Dream4Drive/ GitHub Link: https://github.com/wm-research/Dream4Drive
PDF101December 2, 2025