世界模型中的幻觉是可预测且可预防的

摘要

现代生成式世界模型能够渲染出越来越逼真的、可动作控制的未来场景，但它们经常产生幻觉：生成的画面在视觉上保持流畅，却偏离了真实的动力学特性。我们假设幻觉集中在状态-动作空间的低覆盖区域，而轻量级的数据中心信号既能检测到它，也能指导缓解措施。为了验证这一假设，我们引入了MMBench2——一个包含427小时、210个任务的视觉世界建模数据集，带有真实动作、奖励和实时模拟器，并在其上训练了一个3.5亿参数的世界模型。我们识别出三种不同的幻觉模式：感知幻觉、动作边缘化幻觉和场景发散幻觉——每种模式分别对应于流程的不同阶段——并开发了三种能够准确预测模型将在何处失败的信号。为了在训练时弥合覆盖度差距，我们开发了一种覆盖度感知的采样技术；为了在线弥合差距，我们的幻觉预测器作为好奇心奖励用于针对性数据收集，从而产生一种数据高效的微调方案，使预训练的世界模型仅需少至50条真实环境轨迹就能适应完全未见过的环境。总体而言，我们的发现表明，世界模型中的幻觉本质上是一个数据覆盖度问题，而用于检测它的相同信号也可以用于缓解。论文的交互式网页版可在 https://www.nicklashansen.com/mmbench2 查阅。

English

Modern generative world models render increasingly realistic action-controllable futures, yet they frequently hallucinate: rollouts remain visually fluent while drifting from the ground-truth dynamics. We hypothesize that hallucination concentrates in low-coverage regions of the state-action space, where lightweight data-centric signals can both detect it and guide mitigation. To test this, we introduce MMBench2, a 427-hour, 210-task dataset for visual world modeling with ground-truth actions, rewards, and live simulators, and train a 350M-parameter world model on it. We identify three distinct hallucination modes: perceptual, action-marginalized, and scene-diverging -- each anchored to a different stage of the pipeline, and develop three signals that accurately predict where the model will fail. To close coverage gaps at training time, we develop a coverage-aware sampling technique; to close them online, our hallucination predictors serve as curiosity rewards for targeted data collection, yielding a data-efficient finetuning recipe that adapts the pretrained world model to entirely unseen environments with as few as 50 real environment trajectories. Overall, our findings reveal that hallucination in world models is inherently a data coverage issue, and that the same signals used to detect it can also be used for mitigation. An interactive web version of our paper is available at https://www.nicklashansen.com/mmbench2