ChatPaper.aiChatPaper

魔幻世界:基于几何驱动的交互式视频世界探索

MagicWorld: Interactive Geometry-driven Video World Exploration

November 24, 2025
作者: Guangyuan Li, Siming Zheng, Shuolin Xu, Jinwei Chen, Bo Li, Xiaobin Hu, Lei Zhao, Peng-Tao Jiang
cs.AI

摘要

近期交互式视频世界模型方法能够根据用户指令生成场景演化内容。虽然取得了显著成果,但仍存在两个关键局限:其一,未能充分利用指令驱动场景运动与底层三维几何的对应关系,导致视角变化下的结构不稳定;其二,在多步交互过程中容易遗忘历史信息,造成场景语义与结构的误差累积和渐进偏移。针对这些问题,我们提出MagicWorld——一种融合三维几何先验与历史检索的交互式视频世界模型。该模型从单张场景图像出发,通过用户动作驱动动态场景演化,以自回归方式合成连续场景。我们引入动作引导三维几何模块(AG3D),从每次交互的首帧及对应动作构建点云,为视角转换提供显式几何约束以提升结构一致性。进一步提出历史缓存检索(HCR)机制,在生成过程中检索相关历史帧并将其作为条件信号注入,辅助模型利用过往场景信息并缓解误差累积。实验结果表明,MagicWorld在交互迭代过程中显著提升了场景稳定性与连续性。
English
Recent interactive video world model methods generate scene evolution conditioned on user instructions. Although they achieve impressive results, two key limitations remain. First, they fail to fully exploit the correspondence between instruction-driven scene motion and the underlying 3D geometry, which results in structural instability under viewpoint changes. Second, they easily forget historical information during multi-step interaction, resulting in error accumulation and progressive drift in scene semantics and structure. To address these issues, we propose MagicWorld, an interactive video world model that integrates 3D geometric priors and historical retrieval. MagicWorld starts from a single scene image, employs user actions to drive dynamic scene evolution, and autoregressively synthesizes continuous scenes. We introduce the Action-Guided 3D Geometry Module (AG3D), which constructs a point cloud from the first frame of each interaction and the corresponding action, providing explicit geometric constraints for viewpoint transitions and thereby improving structural consistency. We further propose History Cache Retrieval (HCR) mechanism, which retrieves relevant historical frames during generation and injects them as conditioning signals, helping the model utilize past scene information and mitigate error accumulation. Experimental results demonstrate that MagicWorld achieves notable improvements in scene stability and continuity across interaction iterations.
PDF173December 1, 2025