PRIX: 엔드투엔드 자율 주행을 위한 원시 픽셀 데이터 기반 계획 학습

초록

엔드투엔드 자율주행 모델은 유망한 결과를 보여주지만, 대규모 모델 크기, 고가의 LiDAR 센서에 대한 의존성, 그리고 계산 집약적인 BEV(Bird's Eye View) 특징 표현 등으로 인해 실제 배포에는 어려움이 따릅니다. 이는 특히 카메라만 장착된 대량 생산 차량의 확장성을 제한합니다. 이러한 문제를 해결하기 위해, 우리는 PRIX(Plan from Raw Pixels)를 제안합니다. 우리의 새롭고 효율적인 엔드투엔드 주행 아키텍처는 명시적인 BEV 표현 없이 카메라 데이터만을 사용하며, LiDAR가 필요하지 않습니다. PRIX는 시각적 특징 추출기와 생성적 계획 헤드를 결합하여 원시 픽셀 입력에서 직접 안전한 궤적을 예측합니다. 우리 아키텍처의 핵심 구성 요소는 다중 수준 시각적 특징을 효과적으로 강화하여 더 견고한 계획을 가능하게 하는 Context-aware Recalibration Transformer(CaRT)라는 새로운 모듈입니다. 우리는 포괄적인 실험을 통해 PRIX가 NavSim 및 nuScenes 벤치마크에서 최첨단 성능을 달성하며, 더 크고 다중 모드 확산 계획자의 능력을 따라가면서도 추론 속도와 모델 크기 측면에서 훨씬 더 효율적임을 입증했습니다. 이는 실제 배포를 위한 실용적인 솔루션으로 적합합니다. 우리의 작업은 오픈소스이며, 코드는 https://maxiuw.github.io/prix에서 확인할 수 있습니다.

English

While end-to-end autonomous driving models show promising results, their practical deployment is often hindered by large model sizes, a reliance on expensive LiDAR sensors and computationally intensive BEV feature representations. This limits their scalability, especially for mass-market vehicles equipped only with cameras. To address these challenges, we propose PRIX (Plan from Raw Pixels). Our novel and efficient end-to-end driving architecture operates using only camera data, without explicit BEV representation and forgoing the need for LiDAR. PRIX leverages a visual feature extractor coupled with a generative planning head to predict safe trajectories from raw pixel inputs directly. A core component of our architecture is the Context-aware Recalibration Transformer (CaRT), a novel module designed to effectively enhance multi-level visual features for more robust planning. We demonstrate through comprehensive experiments that PRIX achieves state-of-the-art performance on the NavSim and nuScenes benchmarks, matching the capabilities of larger, multimodal diffusion planners while being significantly more efficient in terms of inference speed and model size, making it a practical solution for real-world deployment. Our work is open-source and the code will be at https://maxiuw.github.io/prix.

PRIX: 엔드투엔드 자율 주행을 위한 원시 픽셀 데이터 기반 계획 학습

PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving

초록

Support