NVIDIA OmniDreams：クローズドループ自動運転車両シミュレーションのためのリアルタイム生成世界モデル

要旨

自動運転車の能力が向上するにつれて、ロングテールシナリオにおける運転ポリシーの安全な評価は依然として重要なボトルネックとなっている。閉ループシミュレーションでは、運転ポリシーモデルが環境と積極的に相互作用し、その動作がシミュレータの状態を動的に更新し、次に生成されるセンサ観測に直接影響を与える。近年の再構成型ニューラルシミュレータは写実性を提供するものの、初期に取得されたデータに根本的に制約され、動的または新規なシーンへの汎化が困難である。これらの限界を克服するために、我々はOmniDreamsを導入する。これはCosmos拡散モデルから中間および事後学習された基盤生成ワールドモデルであり、動作条件付きビデオをリアルタイムで自己回帰的に生成する。Cosmosの豊富な視覚的先行知識と、2万1000時間の運転シナリオによる中間・事後学習を活用することで、OmniDreamsは従来のシミュレータでは捉えにくい極端な気象や予測不能な動的エージェント行動などの複雑で未観測の現象を合成する。重要な点として、過去のフレーム、現在のシミュレータ状態、および即時の運転動作に基づいて、写実的なセンサ生成を自己回帰的に条件付ける。Alpamayo 1ポリシーモデルとAlpaSimオーケストレーターを用いた閉ループシステムに展開されたOmniDreamsは、高い応答性と反応性を持つ環境として機能し、次世代自動運転ポリシーの訓練と評価のためのスケーラブルで包括的なソリューションを提供する。さらに、予備的結果として、OmniDreamsから事後学習されたワールドアクションモデル（WAM）が、Physical AI Autonomous Vehicles NuRecデータセットにおいて優れた性能を達成し、VLAベースのAlpamayo 1.5研究用ポリシーモデルを、総パラメータ数5分の1で上回ることを示す。これらの結果は、OmniDreamsのようなリアルタイムワールドモデルが、ポリシーアーキテクチャのバックボーンとしても機能する可能性を強調している。

English

As autonomous vehicle capabilities advance, the safe evaluation of driving policies in long-tail scenarios remains a critical bottleneck. In closed-loop simulation, the driving policy model actively interacts with the environment, where its actions dynamically update the simulator state and directly influence the next set of generated sensor observations. While recent reconstruction-based neural simulators offer photorealism, they are fundamentally constrained by their initial captured data and struggle to generalize to highly dynamic or novel scenes. To overcome these limitations, we introduce OmniDreams, a foundation generative world model mid- and post-trained from the Cosmos diffusion model to autoregressively generate action-conditioned videos in real time. By leveraging the rich visual priors of Cosmos and mid- and post-training on 21k hours of driving scenarios, OmniDreams synthesizes complex, unobserved phenomena that are hard for traditional simulators to capture, such as extreme weather and unpredictable dynamic agent behaviors. Crucially, it autoregressively conditions its photorealistic sensor generation on past frames, the current simulator state, and immediate driving actions. Deployed in a closed-loop system with the Alpamayo 1 policy model and AlpaSim orchestrator, OmniDreams acts as a highly responsive, reactive environment, providing a scalable and comprehensive solution for training and evaluating next-generation autonomous driving policies. We additionally show preliminary results indicating that a world-action model (WAM) post-trained from OmniDreams achieves strong performance on the Physical AI Autonomous Vehicles NuRec dataset, surpassing the VLA-based Alpamayo 1.5 research policy model while using only 1/5 the total parameters. These results highlight the potential for a real-time world model like OmniDreams to also serve as a backbone for policy architectures.