在挑战性轨迹下构建物理一致性驾驶视频世界模型
Toward Physically Consistent Driving Video World Models under Challenging Trajectories
March 25, 2026
作者: Jiawei Zhou, Zhenxin Zhu, Lingyi Du, Linye Lyu, Lijun Zhou, Zhanqian Wu, Hongcheng Luo, Zhuotao Tian, Bing Wang, Guang Chen, Hangjun Ye, Haiyang Sun, Yu Li
cs.AI
摘要
视频生成模型作为自动驾驶仿真的世界模型已展现出巨大潜力。然而,现有方法主要基于真实驾驶数据集进行训练,这些数据大多包含自然且安全的驾驶场景。因此,当前模型在处理具有挑战性或反事实轨迹时(如模拟器或规划系统生成的不完美轨迹)往往表现不佳,产生的视频存在严重物理不一致性和伪影。为解决这一局限,我们提出PhyGenesis世界模型,该模型能够生成具有高视觉保真度和强物理一致性的驾驶视频。我们的框架包含两个核心组件:(1)物理条件生成器,将可能无效的轨迹输入转化为物理合理的条件;(2)物理增强视频生成器,基于这些条件生成高保真多视角驾驶视频。为有效训练这些组件,我们构建了大规模、富含物理特性的异构数据集。具体而言,除真实驾驶视频外,我们利用CARLA模拟器生成多样化的挑战性驾驶场景,从中提取监督信号以指导模型学习极端条件下的物理动力学。这种挑战性轨迹学习策略实现了轨迹校正,并促进了物理一致的视频生成。大量实验表明,PhyGenesis在各类挑战性轨迹上持续超越现有最优方法。项目页面详见:https://wm-research.github.io/PhyGenesis/。
English
Video generation models have shown strong potential as world models for autonomous driving simulation. However, existing approaches are primarily trained on real-world driving datasets, which mostly contain natural and safe driving scenarios. As a result, current models often fail when conditioned on challenging or counterfactual trajectories-such as imperfect trajectories generated by simulators or planning systems-producing videos with severe physical inconsistencies and artifacts. To address this limitation, we propose PhyGenesis, a world model designed to generate driving videos with high visual fidelity and strong physical consistency. Our framework consists of two key components: (1) a physical condition generator that transforms potentially invalid trajectory inputs into physically plausible conditions, and (2) a physics-enhanced video generator that produces high-fidelity multi-view driving videos under these conditions. To effectively train these components, we construct a large-scale, physics-rich heterogeneous dataset. Specifically, in addition to real-world driving videos, we generate diverse challenging driving scenarios using the CARLA simulator, from which we derive supervision signals that guide the model to learn physically grounded dynamics under extreme conditions. This challenging-trajectory learning strategy enables trajectory correction and promotes physically consistent video generation. Extensive experiments demonstrate that PhyGenesis consistently outperforms state-of-the-art methods, especially on challenging trajectories. Our project page is available at: https://wm-research.github.io/PhyGenesis/.