Yume:交互式世界生成模型
Yume: An Interactive World Generation Model
July 23, 2025
作者: Xiaofeng Mao, Shaoheng Lin, Zhen Li, Chuanhao Li, Wenshuo Peng, Tong He, Jiangmiao Pang, Mingmin Chi, Yu Qiao, Kaipeng Zhang
cs.AI
摘要
Yume致力于利用图像、文本或视频构建一个互动、逼真且动态的世界,支持通过外围设备或神经信号进行探索与控制。在本报告中,我们展示了\method的预览版本,该版本能够从输入图像中生成动态世界,并允许通过键盘操作进行探索。为实现这一高保真且互动的视频世界生成,我们引入了一个精心设计的框架,该框架包含四个主要组件:相机运动量化、视频生成架构、高级采样器及模型加速。首先,我们对相机运动进行量化,以确保训练稳定性及键盘输入的友好交互。随后,我们介绍了带有记忆模块的Masked Video Diffusion Transformer(MVDT),用于以自回归方式实现无限视频生成。接着,采样器中引入了无需训练的抗伪影机制(AAM)和基于随机微分方程的时间旅行采样(TTS-SDE),以提升视觉质量并实现更精确的控制。此外,我们通过对抗性蒸馏与缓存机制的协同优化来研究模型加速。我们使用高质量的世界探索数据集\sekai来训练\method,其在多样场景与应用中取得了显著成果。所有数据、代码库及模型权重均可在https://github.com/stdstu12/YUME获取。Yume将每月更新,以实现其最初目标。项目页面:https://stdstu12.github.io/YUME-Project/。
English
Yume aims to use images, text, or videos to create an interactive, realistic,
and dynamic world, which allows exploration and control using peripheral
devices or neural signals. In this report, we present a preview version of
\method, which creates a dynamic world from an input image and allows
exploration of the world using keyboard actions. To achieve this high-fidelity
and interactive video world generation, we introduce a well-designed framework,
which consists of four main components, including camera motion quantization,
video generation architecture, advanced sampler, and model acceleration. First,
we quantize camera motions for stable training and user-friendly interaction
using keyboard inputs. Then, we introduce the Masked Video Diffusion
Transformer~(MVDT) with a memory module for infinite video generation in an
autoregressive manner. After that, training-free Anti-Artifact Mechanism (AAM)
and Time Travel Sampling based on Stochastic Differential Equations (TTS-SDE)
are introduced to the sampler for better visual quality and more precise
control. Moreover, we investigate model acceleration by synergistic
optimization of adversarial distillation and caching mechanisms. We use the
high-quality world exploration dataset \sekai to train \method, and it achieves
remarkable results in diverse scenes and applications. All data, codebase, and
model weights are available on https://github.com/stdstu12/YUME. Yume will
update monthly to achieve its original goal. Project page:
https://stdstu12.github.io/YUME-Project/.