GeoDrive:具备精确动作控制的三维几何感知驾驶世界模型
GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control
May 28, 2025
作者: Anthony Chen, Wenzhao Zheng, Yida Wang, Xueyang Zhang, Kun Zhan, Peng Jia, Kurt Keutzer, Shanghang Zhang
cs.AI
摘要
世界模型的最新进展彻底革新了动态环境模拟,使系统能够预见未来状态并评估潜在行动。在自动驾驶领域,这些能力帮助车辆预测其他道路使用者的行为,进行风险感知规划,加速模拟训练,并适应新场景,从而提升安全性和可靠性。现有方法在保持稳健的3D几何一致性或处理遮挡时累积伪影方面存在不足,这两点对于自动驾驶导航任务中的可靠安全评估至关重要。为解决这一问题,我们引入了GeoDrive,它明确地将稳健的3D几何条件整合到驾驶世界模型中,以增强空间理解和行动可控性。具体而言,我们首先从输入帧中提取3D表示,然后根据用户指定的自车轨迹获得其2D渲染。为了实现动态建模,我们在训练过程中提出了一个动态编辑模块,通过编辑车辆位置来增强渲染效果。大量实验表明,我们的方法在行动准确性和3D空间感知方面显著优于现有模型,从而实现了更真实、适应性更强且可靠的场景建模,为更安全的自动驾驶提供了保障。此外,我们的模型能够泛化到新轨迹,并提供交互式场景编辑功能,如对象编辑和对象轨迹控制。
English
Recent advancements in world models have revolutionized dynamic environment
simulation, allowing systems to foresee future states and assess potential
actions. In autonomous driving, these capabilities help vehicles anticipate the
behavior of other road users, perform risk-aware planning, accelerate training
in simulation, and adapt to novel scenarios, thereby enhancing safety and
reliability. Current approaches exhibit deficiencies in maintaining robust 3D
geometric consistency or accumulating artifacts during occlusion handling, both
critical for reliable safety assessment in autonomous navigation tasks. To
address this, we introduce GeoDrive, which explicitly integrates robust 3D
geometry conditions into driving world models to enhance spatial understanding
and action controllability. Specifically, we first extract a 3D representation
from the input frame and then obtain its 2D rendering based on the
user-specified ego-car trajectory. To enable dynamic modeling, we propose a
dynamic editing module during training to enhance the renderings by editing the
positions of the vehicles. Extensive experiments demonstrate that our method
significantly outperforms existing models in both action accuracy and 3D
spatial awareness, leading to more realistic, adaptable, and reliable scene
modeling for safer autonomous driving. Additionally, our model can generalize
to novel trajectories and offers interactive scene editing capabilities, such
as object editing and object trajectory control.Summary
AI-Generated Summary