輝達 OmniDreams：用於閉環自動駕駛車輛模擬的即時生成式世界模型

摘要

随着自动驾驶技术能力的提升，在长尾场景中对驾驶策略进行安全评估仍是关键瓶颈。在闭环模拟中，驾驶策略模型与环境主动交互，其行为动态更新模拟器状态，并直接影响下一组传感器观测数据的生成。尽管基于重建的神经模拟器能够实现逼真渲染，但其本质上受限于初始捕获数据，难以泛化至高度动态或新颖场景。为突破这些局限，我们提出OmniDreams——一种基于Cosmos扩散模型进行中训练与后训练的基础生成式世界模型，能够实时自回归生成动作条件视频。通过利用Cosmos丰富的视觉先验知识，并在2.1万小时驾驶场景数据上进行中训练与后训练，OmniDreams可合成传统模拟器难以捕捉的复杂未观测现象，例如极端天气与不可预测的动态智能体行为。关键在于，它能基于历史帧、当前模拟器状态及即时驾驶动作，自回归地生成符合条件的逼真传感器数据。当与Alpamayo 1策略模型及AlpaSim编排器共同部署于闭环系统时，OmniDreams可作为高响应性的反应式环境，为训练与评估下一代自动驾驶策略提供可扩展的全面解决方案。此外，初步实验表明，基于OmniDreams后训练的世界-动作模型在物理AI自动驾驶NuRec数据集上表现优异，超越基于VLA的Alpamayo 1.5研究策略模型，且参数量仅为后者的五分之一。这些结果凸显了像OmniDreams这样的实时世界模型也有潜力成为策略架构的主干网络。

English

As autonomous vehicle capabilities advance, the safe evaluation of driving policies in long-tail scenarios remains a critical bottleneck. In closed-loop simulation, the driving policy model actively interacts with the environment, where its actions dynamically update the simulator state and directly influence the next set of generated sensor observations. While recent reconstruction-based neural simulators offer photorealism, they are fundamentally constrained by their initial captured data and struggle to generalize to highly dynamic or novel scenes. To overcome these limitations, we introduce OmniDreams, a foundation generative world model mid- and post-trained from the Cosmos diffusion model to autoregressively generate action-conditioned videos in real time. By leveraging the rich visual priors of Cosmos and mid- and post-training on 21k hours of driving scenarios, OmniDreams synthesizes complex, unobserved phenomena that are hard for traditional simulators to capture, such as extreme weather and unpredictable dynamic agent behaviors. Crucially, it autoregressively conditions its photorealistic sensor generation on past frames, the current simulator state, and immediate driving actions. Deployed in a closed-loop system with the Alpamayo 1 policy model and AlpaSim orchestrator, OmniDreams acts as a highly responsive, reactive environment, providing a scalable and comprehensive solution for training and evaluating next-generation autonomous driving policies. We additionally show preliminary results indicating that a world-action model (WAM) post-trained from OmniDreams achieves strong performance on the Physical AI Autonomous Vehicles NuRec dataset, surpassing the VLA-based Alpamayo 1.5 research policy model while using only 1/5 the total parameters. These results highlight the potential for a real-time world model like OmniDreams to also serve as a backbone for policy architectures.