ChatPaper.aiChatPaper

Matrix-Game 2.0:一個開源、即時且串流的互動世界模型

Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model

August 18, 2025
作者: Xianglong He, Chunli Peng, Zexiang Liu, Boyang Wang, Yifan Zhang, Qi Cui, Fei Kang, Biao Jiang, Mengyin An, Yangyang Ren, Baixin Xu, Hao-Xiang Guo, Kaixiong Gong, Cyrus Wu, Wei Li, Xuchen Song, Yang Liu, Eric Li, Yahui Zhou
cs.AI

摘要

近期在互動式影片生成領域的進展,已展示擴散模型作為世界模型的潛力,其能捕捉複雜的物理動態與互動行為。然而,現有的互動世界模型依賴於雙向注意力機制與冗長的推理步驟,嚴重限制了即時性能。因此,這些模型難以模擬現實世界的動態,其中結果必須基於歷史背景與當前動作即時更新。為解決此問題,我們提出了Matrix-Game 2.0,這是一個透過少步自回歸擴散即時生成長影片的互動世界模型。我們的框架包含三個關鍵組件:(1) 一個可擴展的數據生產管道,用於Unreal Engine與GTA5環境,以高效產生大量(約1200小時)帶有多樣互動註解的影片數據;(2) 一個動作注入模組,使幀級別的滑鼠與鍵盤輸入作為互動條件;(3) 基於因果架構的少步蒸餾,用於即時與串流影片生成。Matrix-Game 2.0能夠以每秒25幀的超快速度,跨多樣場景生成高品質的分鐘級影片。我們開源了模型權重與代碼庫,以推動互動世界建模的研究。
English
Recent advances in interactive video generations have demonstrated diffusion model's potential as world models by capturing complex physical dynamics and interactive behaviors. However, existing interactive world models depend on bidirectional attention and lengthy inference steps, severely limiting real-time performance. Consequently, they are hard to simulate real-world dynamics, where outcomes must update instantaneously based on historical context and current actions. To address this, we present Matrix-Game 2.0, an interactive world model generates long videos on-the-fly via few-step auto-regressive diffusion. Our framework consists of three key components: (1) A scalable data production pipeline for Unreal Engine and GTA5 environments to effectively produce massive amounts (about 1200 hours) of video data with diverse interaction annotations; (2) An action injection module that enables frame-level mouse and keyboard inputs as interactive conditions; (3) A few-step distillation based on the casual architecture for real-time and streaming video generation. Matrix Game 2.0 can generate high-quality minute-level videos across diverse scenes at an ultra-fast speed of 25 FPS. We open-source our model weights and codebase to advance research in interactive world modeling.
PDF252August 19, 2025