无限世界:通过无姿态分层记忆将交互式世界模型扩展至千帧视界
Infinite-World: Scaling Interactive World Models to 1000-Frame Horizons via Pose-Free Hierarchical Memory
February 2, 2026
作者: Ruiqi Wu, Xuanhua He, Meng Cheng, Tianyu Yang, Yong Zhang, Zhuoliang Kang, Xunliang Cai, Xiaoming Wei, Chunle Guo, Chongyi Li, Ming-Ming Cheng
cs.AI
摘要
我们提出Infinite-World——一种鲁棒的交互式世界模型,能在复杂现实场景中保持超过1000帧的连贯视觉记忆。现有世界模型虽能基于完美真值的合成数据高效优化,但由于噪声姿态估计和视角重访稀缺,缺乏针对真实视频的有效训练范式。为弥补这一差距,我们首先引入分层无姿态记忆压缩器(HPMC),通过递归蒸馏历史潜变量为固定容量的表征。通过将压缩器与生成主干网络联合优化,HPMC使模型能够以有限计算成本自主锚定遥远过去的生成内容,无需显式几何先验。其次,我们提出不确定性感知动作标注模块,将连续运动离散化为三态逻辑。该策略在最大化利用原始视频数据的同时,防止确定性动作空间受噪声轨迹污染,确保鲁棒的动作-响应学习。此外,基于预研实验的启示,我们采用重访密集微调策略,利用仅30分钟的紧凑数据集高效激活模型的长程闭环能力。大量实验(包括客观指标和用户研究)表明,Infinite-World在视觉质量、动作可控性和空间一致性方面均实现卓越性能。
English
We propose Infinite-World, a robust interactive world model capable of maintaining coherent visual memory over 1000+ frames in complex real-world environments. While existing world models can be efficiently optimized on synthetic data with perfect ground-truth, they lack an effective training paradigm for real-world videos due to noisy pose estimations and the scarcity of viewpoint revisits. To bridge this gap, we first introduce a Hierarchical Pose-free Memory Compressor (HPMC) that recursively distills historical latents into a fixed-budget representation. By jointly optimizing the compressor with the generative backbone, HPMC enables the model to autonomously anchor generations in the distant past with bounded computational cost, eliminating the need for explicit geometric priors. Second, we propose an Uncertainty-aware Action Labeling module that discretizes continuous motion into a tri-state logic. This strategy maximizes the utilization of raw video data while shielding the deterministic action space from being corrupted by noisy trajectories, ensuring robust action-response learning. Furthermore, guided by insights from a pilot toy study, we employ a Revisit-Dense Finetuning Strategy using a compact, 30-minute dataset to efficiently activate the model's long-range loop-closure capabilities. Extensive experiments, including objective metrics and user studies, demonstrate that Infinite-World achieves superior performance in visual quality, action controllability, and spatial consistency.