世界モデルにおける幻覚は予測可能かつ防止可能である

要旨

現代の生成的世界モデルは、ますます現実的なアクション制御可能な未来を描き出すが、頻繁に幻覚を生じる。すなわち、ロールアウトは視覚的に流暢でありながら、実際のダイナミクスから乖離するのだ。我々は、幻覚が状態行動空間の低カバレッジ領域に集中するという仮説を立てている。この領域では、軽量なデータ中心のシグナルが幻覚を検出し、緩和を導くことができる。これを検証するため、我々はMMBench2を紹介する。これは、427時間、210タスクからなるビジュアルワールドモデリング用データセットであり、正解のアクション、報酬、ライブシミュレータを備えている。そして、このデータセットで3億5000万パラメータのワールドモデルを訓練する。我々は3つの異なる幻覚モードを特定する。すなわち、知覚的幻覚、アクション周縁化幻覚、シーン逸脱幻覚であり、それぞれがパイプラインの異なる段階に起因する。そして、モデルがどこで失敗するかを正確に予測する3つのシグナルを開発する。訓練時のカバレッジのギャップを埋めるため、カバレッジを考慮したサンプリング手法を開発する。オンラインでギャップを埋めるため、我々の幻覚予測器が好奇心報酬として機能し、対象を絞ったデータ収集を可能にする。これにより、わずか50の実環境軌道で、事前学習済みワールドモデルを全く未知の環境に適応させるデータ効率的な微調整手法が得られる。全体として、我々の発見は、ワールドモデルにおける幻覚が本質的にデータカバレッジの問題であること、そしてその検出に用いたのと同じシグナルが緩和にも使用できることを明らかにしている。本論文のインタラクティブなウェブ版は https://www.nicklashansen.com/mmbench2 で公開されている。

English

Modern generative world models render increasingly realistic action-controllable futures, yet they frequently hallucinate: rollouts remain visually fluent while drifting from the ground-truth dynamics. We hypothesize that hallucination concentrates in low-coverage regions of the state-action space, where lightweight data-centric signals can both detect it and guide mitigation. To test this, we introduce MMBench2, a 427-hour, 210-task dataset for visual world modeling with ground-truth actions, rewards, and live simulators, and train a 350M-parameter world model on it. We identify three distinct hallucination modes: perceptual, action-marginalized, and scene-diverging -- each anchored to a different stage of the pipeline, and develop three signals that accurately predict where the model will fail. To close coverage gaps at training time, we develop a coverage-aware sampling technique; to close them online, our hallucination predictors serve as curiosity rewards for targeted data collection, yielding a data-efficient finetuning recipe that adapts the pretrained world model to entirely unseen environments with as few as 50 real environment trajectories. Overall, our findings reveal that hallucination in world models is inherently a data coverage issue, and that the same signals used to detect it can also be used for mitigation. An interactive web version of our paper is available at https://www.nicklashansen.com/mmbench2