NVIDIA OmniDreams：面向闭环自动驾驶仿真的实时生成式世界模型

摘要

随着自动驾驶能力的持续提升，在长尾场景下对驾驶策略进行安全评估仍是一个关键瓶颈。在闭环仿真中，驾驶策略模型与环境主动交互，其动作动态更新模拟器状态，直接生成下一组传感器观测数据。尽管基于重建的神经模拟器能实现照片级真实感，但其本质上受限于初始捕获数据，难以泛化至高度动态或新颖场景。为突破这些局限，我们提出OmniDreams——一种从Cosmos扩散模型经过中期与后训练得到的生成式基础世界模型，能够以自回归方式实时生成条件化的视频序列。通过利用Cosmos丰富的视觉先验知识，并结合2.1万小时驾驶场景的中期与后训练，OmniDreams可合成传统模拟器难以捕捉的复杂未观测现象（如极端天气和不可预测的智能体动态行为）。关键在于，该模型以自回归方式将照片级传感器生成过程与历史帧、当前模拟器状态及即时驾驶动作相关联。在与Alpamayo 1策略模型及AlpaSim编排器组成的闭环系统中部署时，OmniDreams扮演着高响应、高反应性的环境角色，为下一代自动驾驶策略的训练与评估提供可扩展的全面解决方案。我们进一步展示了初步成果：基于OmniDreams后训练的世界-动作模型（WAM）在物理AI自动驾驶NuRec数据集上取得优异表现，超越基于VLA的Alpamayo 1.5研究策略模型，而参数量仅为后者的五分之一。这些结果凸显了OmniDreams这类实时世界模型作为策略架构骨干的潜力。

English

As autonomous vehicle capabilities advance, the safe evaluation of driving policies in long-tail scenarios remains a critical bottleneck. In closed-loop simulation, the driving policy model actively interacts with the environment, where its actions dynamically update the simulator state and directly influence the next set of generated sensor observations. While recent reconstruction-based neural simulators offer photorealism, they are fundamentally constrained by their initial captured data and struggle to generalize to highly dynamic or novel scenes. To overcome these limitations, we introduce OmniDreams, a foundation generative world model mid- and post-trained from the Cosmos diffusion model to autoregressively generate action-conditioned videos in real time. By leveraging the rich visual priors of Cosmos and mid- and post-training on 21k hours of driving scenarios, OmniDreams synthesizes complex, unobserved phenomena that are hard for traditional simulators to capture, such as extreme weather and unpredictable dynamic agent behaviors. Crucially, it autoregressively conditions its photorealistic sensor generation on past frames, the current simulator state, and immediate driving actions. Deployed in a closed-loop system with the Alpamayo 1 policy model and AlpaSim orchestrator, OmniDreams acts as a highly responsive, reactive environment, providing a scalable and comprehensive solution for training and evaluating next-generation autonomous driving policies. We additionally show preliminary results indicating that a world-action model (WAM) post-trained from OmniDreams achieves strong performance on the Physical AI Autonomous Vehicles NuRec dataset, surpassing the VLA-based Alpamayo 1.5 research policy model while using only 1/5 the total parameters. These results highlight the potential for a real-time world model like OmniDreams to also serve as a backbone for policy architectures.