ChatPaper.aiChatPaper

Yume:一個互動式世界生成模型

Yume: An Interactive World Generation Model

July 23, 2025
作者: Xiaofeng Mao, Shaoheng Lin, Zhen Li, Chuanhao Li, Wenshuo Peng, Tong He, Jiangmiao Pang, Mingmin Chi, Yu Qiao, Kaipeng Zhang
cs.AI

摘要

Yume旨在利用圖像、文字或視頻創建一個互動、真實且動態的世界,允許使用外圍設備或神經信號進行探索與控制。在本報告中,我們展示了\method的預覽版本,該版本從輸入圖像創建動態世界,並允許通過鍵盤操作探索該世界。為實現這一高保真且互動的視頻世界生成,我們引入了一個精心設計的框架,該框架包含四個主要組件:相機運動量化、視頻生成架構、高級採樣器及模型加速。首先,我們量化相機運動以確保訓練的穩定性並便於用戶通過鍵盤輸入進行互動。接著,我們介紹了帶有記憶模塊的Masked Video Diffusion Transformer~(MVDT),用於以自迴歸方式生成無限視頻。隨後,採樣器中引入了無需訓練的抗偽影機制(AAM)和基於隨機微分方程的時間旅行採樣(TTS-SDE),以提升視覺質量並實現更精確的控制。此外,我們通過對抗蒸餾與緩存機制的協同優化來研究模型加速。我們使用高質量的世界探索數據集\sekai來訓練\method,其在多樣場景和應用中取得了顯著成果。所有數據、代碼庫及模型權重均可在https://github.com/stdstu12/YUME獲取。Yume將每月更新以實現其最初目標。項目頁面:https://stdstu12.github.io/YUME-Project/。
English
Yume aims to use images, text, or videos to create an interactive, realistic, and dynamic world, which allows exploration and control using peripheral devices or neural signals. In this report, we present a preview version of \method, which creates a dynamic world from an input image and allows exploration of the world using keyboard actions. To achieve this high-fidelity and interactive video world generation, we introduce a well-designed framework, which consists of four main components, including camera motion quantization, video generation architecture, advanced sampler, and model acceleration. First, we quantize camera motions for stable training and user-friendly interaction using keyboard inputs. Then, we introduce the Masked Video Diffusion Transformer~(MVDT) with a memory module for infinite video generation in an autoregressive manner. After that, training-free Anti-Artifact Mechanism (AAM) and Time Travel Sampling based on Stochastic Differential Equations (TTS-SDE) are introduced to the sampler for better visual quality and more precise control. Moreover, we investigate model acceleration by synergistic optimization of adversarial distillation and caching mechanisms. We use the high-quality world exploration dataset \sekai to train \method, and it achieves remarkable results in diverse scenes and applications. All data, codebase, and model weights are available on https://github.com/stdstu12/YUME. Yume will update monthly to achieve its original goal. Project page: https://stdstu12.github.io/YUME-Project/.
PDF776July 24, 2025