矩陣遊戲:互動世界基礎模型
Matrix-Game: Interactive World Foundation Model
June 23, 2025
作者: Yifan Zhang, Chunli Peng, Boyang Wang, Puyi Wang, Qingcheng Zhu, Fei Kang, Biao Jiang, Zedong Gao, Eric Li, Yang Liu, Yahui Zhou
cs.AI
摘要
我们推出Matrix-Game,这是一个用于可控游戏世界生成的交互式世界基础模型。Matrix-Game采用两阶段训练流程,首先进行大规模无标签预训练以理解环境,随后进行带有动作标签的训练以生成交互式视频。为此,我们精心构建了Matrix-Game-MC,这是一个全面的《我的世界》数据集,包含超过2,700小时的无标签游戏视频片段和超过1,000小时的高质量标签片段,这些片段带有精细的键盘和鼠标动作注释。我们的模型采用可控的图像到世界生成范式,基于参考图像、运动上下文和用户动作进行条件生成。拥有超过170亿参数的Matrix-Game,能够精确控制角色动作和摄像机移动,同时保持高视觉质量和时间连贯性。为了评估性能,我们开发了GameWorld Score,这是一个统一的基准,用于衡量《我的世界》世界生成的视觉质量、时间质量、动作可控性和物理规则理解。大量实验表明,Matrix-Game在所有指标上均优于先前的开源《我的世界》世界模型(包括Oasis和MineWorld),在可控性和物理一致性方面表现尤为突出。双盲人类评估进一步证实了Matrix-Game的优越性,突显了其在不同游戏场景中生成感知真实且精确可控视频的能力。为了促进未来关于交互式图像到世界生成的研究,我们将在https://github.com/SkyworkAI/Matrix-Game上开源Matrix-Game模型权重和GameWorld Score基准。
English
We introduce Matrix-Game, an interactive world foundation model for
controllable game world generation. Matrix-Game is trained using a two-stage
pipeline that first performs large-scale unlabeled pretraining for environment
understanding, followed by action-labeled training for interactive video
generation. To support this, we curate Matrix-Game-MC, a comprehensive
Minecraft dataset comprising over 2,700 hours of unlabeled gameplay video clips
and over 1,000 hours of high-quality labeled clips with fine-grained keyboard
and mouse action annotations. Our model adopts a controllable image-to-world
generation paradigm, conditioned on a reference image, motion context, and user
actions. With over 17 billion parameters, Matrix-Game enables precise control
over character actions and camera movements, while maintaining high visual
quality and temporal coherence. To evaluate performance, we develop GameWorld
Score, a unified benchmark measuring visual quality, temporal quality, action
controllability, and physical rule understanding for Minecraft world
generation. Extensive experiments show that Matrix-Game consistently
outperforms prior open-source Minecraft world models (including Oasis and
MineWorld) across all metrics, with particularly strong gains in
controllability and physical consistency. Double-blind human evaluations
further confirm the superiority of Matrix-Game, highlighting its ability to
generate perceptually realistic and precisely controllable videos across
diverse game scenarios. To facilitate future research on interactive
image-to-world generation, we will open-source the Matrix-Game model weights
and the GameWorld Score benchmark at https://github.com/SkyworkAI/Matrix-Game.