MWM:面向动作条件一致性预测的移动世界模型
MWM: Mobile World Models for Action-Conditioned Consistent Prediction
March 8, 2026
作者: Han Yan, Zishang Xiang, Zeyu Zhang, Hao Tang
cs.AI
摘要
世界模型能够在预测的未来想象空间中进行规划,为具身导航提供了有前景的框架。然而,现有导航世界模型往往缺乏动作条件一致性,导致视觉上合理的预测在多步推演中仍可能产生漂移,进而影响规划性能。此外,高效部署需要少步数扩散推理,但现有蒸馏方法未能显式保持推演一致性,造成训练与推理的不匹配。针对这些挑战,我们提出MWM——一种基于规划的图像目标导航移动世界模型。具体而言,我们设计了结合结构预训练与动作条件一致性后训练的两阶段框架,以提升动作条件推演一致性。进一步提出推理一致性状态蒸馏方法,通过改进的推演一致性实现少步数扩散蒸馏。在基准测试和实际任务上的实验表明,我们的方法在视觉保真度、轨迹精度、规划成功率和推理效率方面均取得持续提升。代码:https://github.com/AIGeeksGroup/MWM。项目网站:https://aigeeksgroup.github.io/MWM。
English
World models enable planning in imagined future predicted space, offering a promising framework for embodied navigation. However, existing navigation world models often lack action-conditioned consistency, so visually plausible predictions can still drift under multi-step rollout and degrade planning. Moreover, efficient deployment requires few-step diffusion inference, but existing distillation methods do not explicitly preserve rollout consistency, creating a training-inference mismatch. To address these challenges, we propose MWM, a mobile world model for planning-based image-goal navigation. Specifically, we introduce a two-stage training framework that combines structure pretraining with Action-Conditioned Consistency (ACC) post-training to improve action-conditioned rollout consistency. We further introduce Inference-Consistent State Distillation (ICSD) for few-step diffusion distillation with improved rollout consistency. Our experiments on benchmark and real-world tasks demonstrate consistent gains in visual fidelity, trajectory accuracy, planning success, and inference efficiency. Code: https://github.com/AIGeeksGroup/MWM. Website: https://aigeeksgroup.github.io/MWM.