ChatPaper.aiChatPaper

DrivingGen:面向自动驾驶生成式视频世界模型的综合基准

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

January 4, 2026
作者: Yang Zhou, Hao Shao, Letian Wang, Zhuofan Zong, Hongsheng Li, Steven L. Waslander
cs.AI

摘要

作为世界模型的一种形式,视频生成模型已成为人工智能领域最令人兴奋的前沿技术之一,它通过建模复杂场景的时间演化,使智能体获得预测未来的能力。在自动驾驶领域,这一愿景催生了驾驶世界模型:这类生成式模拟器能预测自车与其他交通参与者的未来状态,实现可扩展的仿真、边缘场景的安全测试以及丰富的合成数据生成。然而尽管研究活动快速增长,该领域仍缺乏严谨的基准来衡量进展并指导重点方向。现有评估存在局限:通用视频指标忽略了安全关键的成像因素;轨迹合理性鲜少被量化;时间与智能体层面的连贯性被忽视;基于自车条件的可控性也未受重视。此外,当前数据集难以覆盖现实应用所需的多样化条件。 为弥补这些不足,我们推出首个面向生成式驾驶世界模型的综合基准——DrivingGen。该基准整合了从驾驶数据集和互联网规模视频源精选的多样化评估数据集,涵盖不同天气、昼夜时段、地理区域和复杂驾驶场景,并配备一套创新指标,从视觉真实感、轨迹合理性、时间连贯性和可控性四个维度进行联合评估。通过对14个前沿模型的测试,我们发现明显的性能权衡:通用模型视觉效果更佳但违背物理规律,而驾驶专用模型能真实还原运动轨迹却落后于视觉质量。DrivingGen通过提供统一评估框架,助力开发可靠、可控、可部署的驾驶世界模型,为可扩展仿真、路径规划及数据驱动决策提供支撑。
English
Video generation models, as one form of world models, have emerged as one of the most exciting frontiers in AI, promising agents the ability to imagine the future by modeling the temporal evolution of complex scenes. In autonomous driving, this vision gives rise to driving world models: generative simulators that imagine ego and agent futures, enabling scalable simulation, safe testing of corner cases, and rich synthetic data generation. Yet, despite fast-growing research activity, the field lacks a rigorous benchmark to measure progress and guide priorities. Existing evaluations remain limited: generic video metrics overlook safety-critical imaging factors; trajectory plausibility is rarely quantified; temporal and agent-level consistency is neglected; and controllability with respect to ego conditioning is ignored. Moreover, current datasets fail to cover the diversity of conditions required for real-world deployment. To address these gaps, we present DrivingGen, the first comprehensive benchmark for generative driving world models. DrivingGen combines a diverse evaluation dataset curated from both driving datasets and internet-scale video sources, spanning varied weather, time of day, geographic regions, and complex maneuvers, with a suite of new metrics that jointly assess visual realism, trajectory plausibility, temporal coherence, and controllability. Benchmarking 14 state-of-the-art models reveals clear trade-offs: general models look better but break physics, while driving-specific ones capture motion realistically but lag in visual quality. DrivingGen offers a unified evaluation framework to foster reliable, controllable, and deployable driving world models, enabling scalable simulation, planning, and data-driven decision-making.
PDF193February 7, 2026