Drive&Gen: 엔드투엔드 운전 및 비디오 생성 모델의 공동 평가

초록

최근 생성 모델의 발전은 자율주행 차량 분야에서 새로운 가능성을 열어주고 있습니다. 특히, 비디오 생성 모델이 제어 가능한 가상 테스트 환경으로서 탐구되고 있습니다. 동시에, 종단 간(End-to-End, E2E) 주행 모델은 기존의 모듈식 자율주행 시스템에 비해 단순성과 확장성으로 인해 주목받고 있습니다. 그러나 이러한 기술을 시뮬레이션 및 계획에 적용하는 데에는 중요한 질문들이 제기됩니다. 첫째, 비디오 생성 모델이 점점 더 사실적인 비디오를 생성할 수 있지만, 이러한 비디오가 지정된 조건을 충실히 따르며 E2E 자율 계획 평가에 충분히 현실적일 수 있는가? 둘째, 데이터가 E2E 계획을 이해하고 제어하는 데 중요하다면, 어떻게 이들의 편향을 더 깊이 이해하고 분포 외(out-of-distribution) 시나리오에 대한 일반화 능력을 향상시킬 수 있는가? 본 연구에서는 이러한 질문들을 해결하기 위해 주행 모델과 생성적 세계 모델(Drive&Gen) 간의 간극을 메웁니다. 우리는 E2E 주행 모델을 활용하여 생성된 비디오의 현실성을 평가하는 새로운 통계적 측정 방법을 제안합니다. 비디오 생성 모델의 제어 가능성을 활용하여, E2E 계획 성능에 영향을 미치는 분포 간극을 조사하기 위한 표적 실험을 수행합니다. 마지막으로, 비디오 생성 모델이 생성한 합성 데이터가 실제 데이터 수집에 비해 비용 효율적인 대안이 될 수 있음을 보여줍니다. 이 합성 데이터는 기존 운용 설계 영역(Operational Design Domains)을 넘어 E2E 모델의 일반화를 효과적으로 개선하여, 자율주행 차량 서비스를 새로운 운용 환경으로 확장하는 데 기여합니다.

English

Recent advances in generative models have sparked exciting new possibilities in the field of autonomous vehicles. Specifically, video generation models are now being explored as controllable virtual testing environments. Simultaneously, end-to-end (E2E) driving models have emerged as a streamlined alternative to conventional modular autonomous driving systems, gaining popularity for their simplicity and scalability. However, the application of these techniques to simulation and planning raises important questions. First, while video generation models can generate increasingly realistic videos, can these videos faithfully adhere to the specified conditions and be realistic enough for E2E autonomous planner evaluation? Second, given that data is crucial for understanding and controlling E2E planners, how can we gain deeper insights into their biases and improve their ability to generalize to out-of-distribution scenarios? In this work, we bridge the gap between the driving models and generative world models (Drive&Gen) to address these questions. We propose novel statistical measures leveraging E2E drivers to evaluate the realism of generated videos. By exploiting the controllability of the video generation model, we conduct targeted experiments to investigate distribution gaps affecting E2E planner performance. Finally, we show that synthetic data produced by the video generation model offers a cost-effective alternative to real-world data collection. This synthetic data effectively improves E2E model generalization beyond existing Operational Design Domains, facilitating the expansion of autonomous vehicle services into new operational contexts.

Drive&Gen: 엔드투엔드 운전 및 비디오 생성 모델의 공동 평가

Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models

초록

Support