Drive&Gen:端到端駕駛與視頻生成模型的聯合評估
Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models
October 7, 2025
作者: Jiahao Wang, Zhenpei Yang, Yijing Bai, Yingwei Li, Yuliang Zou, Bo Sun, Abhijit Kundu, Jose Lezama, Luna Yue Huang, Zehao Zhu, Jyh-Jing Hwang, Dragomir Anguelov, Mingxing Tan, Chiyu Max Jiang
cs.AI
摘要
生成模型的最新進展為自動駕駛領域帶來了令人振奮的新可能性。特別是,視頻生成模型正被探索作為可控的虛擬測試環境。同時,端到端(E2E)駕駛模型作為傳統模塊化自動駕駛系統的簡化替代方案,因其簡單性和可擴展性而受到歡迎。然而,這些技術在模擬和規劃中的應用引發了重要問題。首先,雖然視頻生成模型能夠生成越來越逼真的視頻,但這些視頻能否忠實地遵循指定條件,並足夠真實以用於E2E自動規劃器的評估?其次,考慮到數據對於理解和控制E2E規劃器至關重要,我們如何更深入地了解其偏見並提高其在分佈外場景中的泛化能力?在本研究中,我們通過將駕駛模型與生成世界模型(Drive&Gen)相結合來解決這些問題。我們提出了利用E2E駕駛器來評估生成視頻真實性的新統計方法。通過利用視頻生成模型的可控性,我們進行了有針對性的實驗,以研究影響E2E規劃器性能的分佈差距。最後,我們展示了由視頻生成模型產生的合成數據作為現實世界數據收集的成本效益替代方案。這些合成數據有效地提高了E2E模型在現有操作設計域之外的泛化能力,促進了自動駕駛服務向新操作環境的擴展。
English
Recent advances in generative models have sparked exciting new possibilities
in the field of autonomous vehicles. Specifically, video generation models are
now being explored as controllable virtual testing environments.
Simultaneously, end-to-end (E2E) driving models have emerged as a streamlined
alternative to conventional modular autonomous driving systems, gaining
popularity for their simplicity and scalability. However, the application of
these techniques to simulation and planning raises important questions. First,
while video generation models can generate increasingly realistic videos, can
these videos faithfully adhere to the specified conditions and be realistic
enough for E2E autonomous planner evaluation? Second, given that data is
crucial for understanding and controlling E2E planners, how can we gain deeper
insights into their biases and improve their ability to generalize to
out-of-distribution scenarios? In this work, we bridge the gap between the
driving models and generative world models (Drive&Gen) to address these
questions. We propose novel statistical measures leveraging E2E drivers to
evaluate the realism of generated videos. By exploiting the controllability of
the video generation model, we conduct targeted experiments to investigate
distribution gaps affecting E2E planner performance. Finally, we show that
synthetic data produced by the video generation model offers a cost-effective
alternative to real-world data collection. This synthetic data effectively
improves E2E model generalization beyond existing Operational Design Domains,
facilitating the expansion of autonomous vehicle services into new operational
contexts.