MWM：面向动作条件一致性预测的移动世界模型

摘要

世界模型能够在预测的未来想象空间中进行规划，为具身导航提供了有前景的框架。然而，现有导航世界模型往往缺乏动作条件一致性，导致视觉上合理的预测在多步推演中仍可能产生漂移，进而影响规划性能。此外，高效部署需要少步数扩散推理，但现有蒸馏方法未能显式保持推演一致性，造成训练与推理的不匹配。针对这些挑战，我们提出MWM——一种基于规划的图像目标导航移动世界模型。具体而言，我们设计了结合结构预训练与动作条件一致性后训练的两阶段框架，以提升动作条件推演一致性。进一步提出推理一致性状态蒸馏方法，通过改进的推演一致性实现少步数扩散蒸馏。在基准测试和实际任务上的实验表明，我们的方法在视觉保真度、轨迹精度、规划成功率和推理效率方面均取得持续提升。代码：https://github.com/AIGeeksGroup/MWM。项目网站：https://aigeeksgroup.github.io/MWM。

English

World models enable planning in imagined future predicted space, offering a promising framework for embodied navigation. However, existing navigation world models often lack action-conditioned consistency, so visually plausible predictions can still drift under multi-step rollout and degrade planning. Moreover, efficient deployment requires few-step diffusion inference, but existing distillation methods do not explicitly preserve rollout consistency, creating a training-inference mismatch. To address these challenges, we propose MWM, a mobile world model for planning-based image-goal navigation. Specifically, we introduce a two-stage training framework that combines structure pretraining with Action-Conditioned Consistency (ACC) post-training to improve action-conditioned rollout consistency. We further introduce Inference-Consistent State Distillation (ICSD) for few-step diffusion distillation with improved rollout consistency. Our experiments on benchmark and real-world tasks demonstrate consistent gains in visual fidelity, trajectory accuracy, planning success, and inference efficiency. Code: https://github.com/AIGeeksGroup/MWM. Website: https://aigeeksgroup.github.io/MWM.