ChatPaper.aiChatPaper

矩陣遊戲:互動世界基礎模型

Matrix-Game: Interactive World Foundation Model

June 23, 2025
作者: Yifan Zhang, Chunli Peng, Boyang Wang, Puyi Wang, Qingcheng Zhu, Fei Kang, Biao Jiang, Zedong Gao, Eric Li, Yang Liu, Yahui Zhou
cs.AI

摘要

我们推出Matrix-Game,这是一个用于可控游戏世界生成的交互式世界基础模型。Matrix-Game采用两阶段训练流程,首先进行大规模无标签预训练以理解环境,随后进行带有动作标签的训练以生成交互式视频。为此,我们精心构建了Matrix-Game-MC,这是一个全面的《我的世界》数据集,包含超过2,700小时的无标签游戏视频片段和超过1,000小时的高质量标签片段,这些片段带有精细的键盘和鼠标动作注释。我们的模型采用可控的图像到世界生成范式,基于参考图像、运动上下文和用户动作进行条件生成。拥有超过170亿参数的Matrix-Game,能够精确控制角色动作和摄像机移动,同时保持高视觉质量和时间连贯性。为了评估性能,我们开发了GameWorld Score,这是一个统一的基准,用于衡量《我的世界》世界生成的视觉质量、时间质量、动作可控性和物理规则理解。大量实验表明,Matrix-Game在所有指标上均优于先前的开源《我的世界》世界模型(包括Oasis和MineWorld),在可控性和物理一致性方面表现尤为突出。双盲人类评估进一步证实了Matrix-Game的优越性,突显了其在不同游戏场景中生成感知真实且精确可控视频的能力。为了促进未来关于交互式图像到世界生成的研究,我们将在https://github.com/SkyworkAI/Matrix-Game上开源Matrix-Game模型权重和GameWorld Score基准。
English
We introduce Matrix-Game, an interactive world foundation model for controllable game world generation. Matrix-Game is trained using a two-stage pipeline that first performs large-scale unlabeled pretraining for environment understanding, followed by action-labeled training for interactive video generation. To support this, we curate Matrix-Game-MC, a comprehensive Minecraft dataset comprising over 2,700 hours of unlabeled gameplay video clips and over 1,000 hours of high-quality labeled clips with fine-grained keyboard and mouse action annotations. Our model adopts a controllable image-to-world generation paradigm, conditioned on a reference image, motion context, and user actions. With over 17 billion parameters, Matrix-Game enables precise control over character actions and camera movements, while maintaining high visual quality and temporal coherence. To evaluate performance, we develop GameWorld Score, a unified benchmark measuring visual quality, temporal quality, action controllability, and physical rule understanding for Minecraft world generation. Extensive experiments show that Matrix-Game consistently outperforms prior open-source Minecraft world models (including Oasis and MineWorld) across all metrics, with particularly strong gains in controllability and physical consistency. Double-blind human evaluations further confirm the superiority of Matrix-Game, highlighting its ability to generate perceptually realistic and precisely controllable videos across diverse game scenarios. To facilitate future research on interactive image-to-world generation, we will open-source the Matrix-Game model weights and the GameWorld Score benchmark at https://github.com/SkyworkAI/Matrix-Game.
PDF372June 25, 2025