GeoDrive:具精確動作控制的三維幾何感知駕駛世界模型
GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control
May 28, 2025
作者: Anthony Chen, Wenzhao Zheng, Yida Wang, Xueyang Zhang, Kun Zhan, Peng Jia, Kurt Keutzer, Shanghang Zhang
cs.AI
摘要
近期,世界模型的進展革新了動態環境模擬,使系統能夠預見未來狀態並評估潛在行動。在自動駕駛領域,這些能力幫助車輛預測其他道路使用者的行為、執行風險感知規劃、加速模擬訓練,並適應新場景,從而提升安全性和可靠性。現有方法在保持穩健的三維幾何一致性或處理遮擋時的累積偽影方面存在不足,而這兩者對於自動導航任務中的可靠安全評估至關重要。為解決這一問題,我們引入了GeoDrive,它將穩健的三維幾何條件明確整合到駕駛世界模型中,以增強空間理解與行動可控性。具體而言,我們首先從輸入幀中提取三維表示,然後根據用戶指定的自車軌跡獲取其二維渲染。為了實現動態建模,我們在訓練過程中提出了一個動態編輯模塊,通過編輯車輛位置來增強渲染效果。大量實驗表明,我們的方法在行動準確性和三維空間感知方面顯著優於現有模型,從而實現了更真實、適應性更強且可靠的場景建模,為更安全的自動駕駛提供了保障。此外,我們的模型能夠泛化到新軌跡,並提供交互式場景編輯功能,如物體編輯和物體軌跡控制。
English
Recent advancements in world models have revolutionized dynamic environment
simulation, allowing systems to foresee future states and assess potential
actions. In autonomous driving, these capabilities help vehicles anticipate the
behavior of other road users, perform risk-aware planning, accelerate training
in simulation, and adapt to novel scenarios, thereby enhancing safety and
reliability. Current approaches exhibit deficiencies in maintaining robust 3D
geometric consistency or accumulating artifacts during occlusion handling, both
critical for reliable safety assessment in autonomous navigation tasks. To
address this, we introduce GeoDrive, which explicitly integrates robust 3D
geometry conditions into driving world models to enhance spatial understanding
and action controllability. Specifically, we first extract a 3D representation
from the input frame and then obtain its 2D rendering based on the
user-specified ego-car trajectory. To enable dynamic modeling, we propose a
dynamic editing module during training to enhance the renderings by editing the
positions of the vehicles. Extensive experiments demonstrate that our method
significantly outperforms existing models in both action accuracy and 3D
spatial awareness, leading to more realistic, adaptable, and reliable scene
modeling for safer autonomous driving. Additionally, our model can generalize
to novel trajectories and offers interactive scene editing capabilities, such
as object editing and object trajectory control.Summary
AI-Generated Summary