GeoDrive: 正確な動作制御を備えた3D幾何学情報に基づく運転世界モデル

要旨

世界モデルの最近の進展は、動的環境シミュレーションに革命をもたらし、システムが将来の状態を予測し、潜在的な行動を評価することを可能にしました。自動運転において、これらの能力は、車両が他の道路利用者の行動を予測し、リスクを考慮した計画を立て、シミュレーションでのトレーニングを加速し、新しいシナリオに適応することを支援し、安全性と信頼性を向上させます。現在のアプローチは、堅牢な3D幾何学的整合性を維持するか、オクルージョン処理中にアーティファクトを蓄積するかのいずれかにおいて欠陥を示しており、これらは自動ナビゲーションタスクにおける信頼性の高い安全性評価に不可欠です。これに対処するため、我々はGeoDriveを導入し、堅牢な3D幾何学的条件を運転世界モデルに明示的に統合して、空間理解と行動制御性を向上させます。具体的には、まず入力フレームから3D表現を抽出し、ユーザー指定の自車軌跡に基づいてその2Dレンダリングを取得します。動的モデリングを可能にするため、トレーニング中に動的編集モジュールを提案し、車両の位置を編集することでレンダリングを強化します。広範な実験により、我々の方法が既存のモデルを行動精度と3D空間認識の両方で大幅に上回り、より現実的で適応性があり信頼性の高いシーンモデリングを実現し、安全な自動運転を実現することが示されました。さらに、我々のモデルは新しい軌跡に一般化でき、オブジェクト編集やオブジェクト軌跡制御などのインタラクティブなシーン編集機能を提供します。

English

Recent advancements in world models have revolutionized dynamic environment simulation, allowing systems to foresee future states and assess potential actions. In autonomous driving, these capabilities help vehicles anticipate the behavior of other road users, perform risk-aware planning, accelerate training in simulation, and adapt to novel scenarios, thereby enhancing safety and reliability. Current approaches exhibit deficiencies in maintaining robust 3D geometric consistency or accumulating artifacts during occlusion handling, both critical for reliable safety assessment in autonomous navigation tasks. To address this, we introduce GeoDrive, which explicitly integrates robust 3D geometry conditions into driving world models to enhance spatial understanding and action controllability. Specifically, we first extract a 3D representation from the input frame and then obtain its 2D rendering based on the user-specified ego-car trajectory. To enable dynamic modeling, we propose a dynamic editing module during training to enhance the renderings by editing the positions of the vehicles. Extensive experiments demonstrate that our method significantly outperforms existing models in both action accuracy and 3D spatial awareness, leading to more realistic, adaptable, and reliable scene modeling for safer autonomous driving. Additionally, our model can generalize to novel trajectories and offers interactive scene editing capabilities, such as object editing and object trajectory control.

GeoDrive: 正確な動作制御を備えた3D幾何学情報に基づく運転世界モデル

GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control

要旨

Support