從虛擬遊戲到現實世界的遊樂
From Virtual Games to Real-World Play
June 23, 2025
作者: Wenqiang Sun, Fangyun Wei, Jinjing Zhao, Xi Chen, Zilong Chen, Hongyang Zhang, Jun Zhang, Yan Lu
cs.AI
摘要
我们介绍RealPlay,一种基于神经网络的真实世界游戏引擎,它能够从用户控制信号生成交互式视频。与以往专注于游戏风格视觉效果的研究不同,RealPlay旨在生成逼真且时间上一致的视频序列,这些序列类似于真实世界的影像。它在一个交互循环中运行:用户观察生成的场景,发出控制命令,并接收一个简短的视频片段作为响应。为了实现这种真实且响应迅速的生成,我们解决了包括低延迟反馈的迭代分块预测、跨迭代的时间一致性以及准确的控制响应在内的关键挑战。RealPlay在标记的游戏数据和无标记的真实世界视频的组合上进行训练,无需真实世界的动作注释。值得注意的是,我们观察到两种形式的泛化:(1)控制转移——RealPlay有效地将控制信号从虚拟场景映射到真实世界场景;(2)实体转移——尽管训练标签仅来源于赛车游戏,但RealPlay能够泛化到控制包括自行车和行人在内的多种真实世界实体,而不仅仅是车辆。项目页面可访问:https://wenqsun.github.io/RealPlay/
English
We introduce RealPlay, a neural network-based real-world game engine that
enables interactive video generation from user control signals. Unlike prior
works focused on game-style visuals, RealPlay aims to produce photorealistic,
temporally consistent video sequences that resemble real-world footage. It
operates in an interactive loop: users observe a generated scene, issue a
control command, and receive a short video chunk in response. To enable such
realistic and responsive generation, we address key challenges including
iterative chunk-wise prediction for low-latency feedback, temporal consistency
across iterations, and accurate control response. RealPlay is trained on a
combination of labeled game data and unlabeled real-world videos, without
requiring real-world action annotations. Notably, we observe two forms of
generalization: (1) control transfer-RealPlay effectively maps control signals
from virtual to real-world scenarios; and (2) entity transfer-although training
labels originate solely from a car racing game, RealPlay generalizes to control
diverse real-world entities, including bicycles and pedestrians, beyond
vehicles. Project page can be found: https://wenqsun.github.io/RealPlay/