NVIDIA OmniDreams: 폐루프 자율주행 차량 시뮬레이션을 위한 실시간 생성적 세계 모델

초록

자율주행 차량의 기능이 발전함에 따라, 롱테일 시나리오에서 주행 정책의 안전한 평가는 여전히 중요한 병목 지점으로 남아 있습니다. 폐쇄 루프 시뮬레이션에서 주행 정책 모델은 환경과 능동적으로 상호작용하며, 해당 모델의 행동은 시뮬레이터 상태를 동적으로 업데이트하고 다음에 생성되는 센서 관측값 집합에 직접적인 영향을 미칩니다. 최근의 재구성 기반 신경 시뮬레이터는 포토리얼리즘을 제공하지만, 근본적으로 초기에 캡처된 데이터에 의해 제약을 받으며 고도로 동적이거나 새로운 장면으로 일반화하는 데 어려움을 겪습니다. 이러한 한계를 극복하기 위해 우리는 OmniDreams를 소개합니다. 이는 Cosmos 확산 모델을 중간 및 사후 학습하여 행동 조건화된 비디오를 실시간으로 자기회귀적으로 생성하는 기반 생성 세계 모델입니다. Cosmos의 풍부한 시각적 사전 지식과 21,000시간의 주행 시나리오에 대한 중간 및 사후 학습을 활용함으로써, OmniDreams는 극한 기상 조건 및 예측 불가능한 동적 에이전트 행동과 같이 기존 시뮬레이터가 포착하기 어려운 복잡하고 관찰되지 않은 현상을 합성합니다. 중요한 점은, 이 모델이 과거 프레임, 현재 시뮬레이터 상태, 그리고 즉각적인 주행 행동에 기반하여 사실적인 센서 생성을 자기회귀적으로 조건화한다는 것입니다. OmniDreams는 Alpamayo 1 정책 모델 및 AlpaSim 오케스트레이터와 함께 폐쇄 루프 시스템에 배포되어 고도의 응답성과 반응성을 갖춘 환경으로서 작동하며, 차세대 자율주행 정책을 훈련하고 평가하기 위한 확장 가능하고 포괄적인 솔루션을 제공합니다. 또한, OmniDreams로부터 사후 학습된 세계-행동 모델(WAM)이 Physical AI 자율주행 NuRec 데이터셋에서 VLA 기반 Alpamayo 1.5 연구 정책 모델을 능가하는 강력한 성능을 달성하면서도 총 파라미터 수는 1/5에 불과하다는 예비 결과를 추가로 제시합니다. 이러한 결과는 OmniDreams와 같은 실시간 세계 모델이 정책 아키텍처의 백본으로도 활용될 수 있는 잠재력을 강조합니다.

English

As autonomous vehicle capabilities advance, the safe evaluation of driving policies in long-tail scenarios remains a critical bottleneck. In closed-loop simulation, the driving policy model actively interacts with the environment, where its actions dynamically update the simulator state and directly influence the next set of generated sensor observations. While recent reconstruction-based neural simulators offer photorealism, they are fundamentally constrained by their initial captured data and struggle to generalize to highly dynamic or novel scenes. To overcome these limitations, we introduce OmniDreams, a foundation generative world model mid- and post-trained from the Cosmos diffusion model to autoregressively generate action-conditioned videos in real time. By leveraging the rich visual priors of Cosmos and mid- and post-training on 21k hours of driving scenarios, OmniDreams synthesizes complex, unobserved phenomena that are hard for traditional simulators to capture, such as extreme weather and unpredictable dynamic agent behaviors. Crucially, it autoregressively conditions its photorealistic sensor generation on past frames, the current simulator state, and immediate driving actions. Deployed in a closed-loop system with the Alpamayo 1 policy model and AlpaSim orchestrator, OmniDreams acts as a highly responsive, reactive environment, providing a scalable and comprehensive solution for training and evaluating next-generation autonomous driving policies. We additionally show preliminary results indicating that a world-action model (WAM) post-trained from OmniDreams achieves strong performance on the Physical AI Autonomous Vehicles NuRec dataset, surpassing the VLA-based Alpamayo 1.5 research policy model while using only 1/5 the total parameters. These results highlight the potential for a real-time world model like OmniDreams to also serve as a backbone for policy architectures.