Drive&Gen: エンドツーエンド運転モデルと映像生成モデルの共評価

要旨

近年の生成モデルの進展により、自動運転車の分野において新たな可能性が開かれている。特に、ビデオ生成モデルが制御可能な仮想テスト環境として検討されるようになった。同時に、エンドツーエンド（E2E）運転モデルが、従来のモジュール型自動運転システムに代わる簡潔でスケーラブルな代替手段として登場し、その人気を集めている。しかし、これらの技術をシミュレーションや計画に適用する際には重要な疑問が生じる。第一に、ビデオ生成モデルがますます現実的なビデオを生成できるとしても、それらのビデオは指定された条件に忠実に従い、E2E自律プランナーの評価に十分な現実感を備えているのか？第二に、データがE2Eプランナーの理解と制御に不可欠であることを考えると、そのバイアスを深く理解し、分布外シナリオへの汎化能力を向上させるにはどうすればよいのか？本研究では、これらの疑問に取り組むため、運転モデルと生成世界モデル（Drive&Gen）の間のギャップを埋める。我々は、E2Eドライバーを活用した新しい統計的尺度を提案し、生成されたビデオの現実感を評価する。ビデオ生成モデルの制御性を利用して、E2Eプランナーの性能に影響を与える分布ギャップを調査するためのターゲット実験を実施する。最後に、ビデオ生成モデルによって生成された合成データが、実世界のデータ収集に代わる費用対効果の高い代替手段であることを示す。この合成データは、既存の運用設計領域を超えたE2Eモデルの汎化を効果的に向上させ、自動運転車サービスの新しい運用コンテキストへの拡大を促進する。

English

Recent advances in generative models have sparked exciting new possibilities in the field of autonomous vehicles. Specifically, video generation models are now being explored as controllable virtual testing environments. Simultaneously, end-to-end (E2E) driving models have emerged as a streamlined alternative to conventional modular autonomous driving systems, gaining popularity for their simplicity and scalability. However, the application of these techniques to simulation and planning raises important questions. First, while video generation models can generate increasingly realistic videos, can these videos faithfully adhere to the specified conditions and be realistic enough for E2E autonomous planner evaluation? Second, given that data is crucial for understanding and controlling E2E planners, how can we gain deeper insights into their biases and improve their ability to generalize to out-of-distribution scenarios? In this work, we bridge the gap between the driving models and generative world models (Drive&Gen) to address these questions. We propose novel statistical measures leveraging E2E drivers to evaluate the realism of generated videos. By exploiting the controllability of the video generation model, we conduct targeted experiments to investigate distribution gaps affecting E2E planner performance. Finally, we show that synthetic data produced by the video generation model offers a cost-effective alternative to real-world data collection. This synthetic data effectively improves E2E model generalization beyond existing Operational Design Domains, facilitating the expansion of autonomous vehicle services into new operational contexts.

Drive&Gen: エンドツーエンド運転モデルと映像生成モデルの共評価

Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models

要旨

Support