ChatPaper.aiChatPaper

Matrix-Game 2.0:一款开源、实时、流式交互的世界模型

Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model

August 18, 2025
作者: Xianglong He, Chunli Peng, Zexiang Liu, Boyang Wang, Yifan Zhang, Qi Cui, Fei Kang, Biao Jiang, Mengyin An, Yangyang Ren, Baixin Xu, Hao-Xiang Guo, Kaixiong Gong, Cyrus Wu, Wei Li, Xuchen Song, Yang Liu, Eric Li, Yahui Zhou
cs.AI

摘要

近期在交互式视频生成领域的进展表明,扩散模型作为世界模型具有巨大潜力,能够捕捉复杂的物理动态和交互行为。然而,现有的交互式世界模型依赖于双向注意力机制和冗长的推理步骤,严重限制了实时性能。因此,它们难以模拟现实世界的动态,其中结果必须基于历史背景和当前动作即时更新。为解决这一问题,我们提出了Matrix-Game 2.0,这是一种通过少步自回归扩散实时生成长视频的交互式世界模型。我们的框架包含三个关键组件:(1)一个可扩展的数据生产管道,用于Unreal Engine和GTA5环境,以高效生成大量(约1200小时)带有多样化交互注释的视频数据;(2)一个动作注入模块,支持将帧级鼠标和键盘输入作为交互条件;(3)基于因果架构的少步蒸馏,用于实时和流式视频生成。Matrix Game 2.0能够以25 FPS的超快速度跨多样场景生成高质量分钟级视频。我们开源了模型权重和代码库,以推动交互式世界建模的研究。
English
Recent advances in interactive video generations have demonstrated diffusion model's potential as world models by capturing complex physical dynamics and interactive behaviors. However, existing interactive world models depend on bidirectional attention and lengthy inference steps, severely limiting real-time performance. Consequently, they are hard to simulate real-world dynamics, where outcomes must update instantaneously based on historical context and current actions. To address this, we present Matrix-Game 2.0, an interactive world model generates long videos on-the-fly via few-step auto-regressive diffusion. Our framework consists of three key components: (1) A scalable data production pipeline for Unreal Engine and GTA5 environments to effectively produce massive amounts (about 1200 hours) of video data with diverse interaction annotations; (2) An action injection module that enables frame-level mouse and keyboard inputs as interactive conditions; (3) A few-step distillation based on the casual architecture for real-time and streaming video generation. Matrix Game 2.0 can generate high-quality minute-level videos across diverse scenes at an ultra-fast speed of 25 FPS. We open-source our model weights and codebase to advance research in interactive world modeling.
PDF252August 19, 2025