扩散模型是实时游戏引擎。
Diffusion Models Are Real-Time Game Engines
August 27, 2024
作者: Dani Valevski, Yaniv Leviathan, Moab Arar, Shlomi Fruchter
cs.AI
摘要
我们提出了GameNGen,这是第一个完全由神经模型驱动的游戏引擎,可以在高质量的长轨迹上实现与复杂环境的实时交互。GameNGen可以在单个TPU上以每秒超过20帧的速度交互式模拟经典游戏DOOM。下一帧预测达到了29.4的PSNR,与有损JPEG压缩相当。人类评分者仅略优于随机机会,可以区分游戏的短片段和模拟的片段。GameNGen经过两个阶段的训练:(1)一个RL代理学习玩游戏并记录训练会话,(2)一个扩散模型被训练以在过去帧和动作序列的条件下生成下一帧。条件增强使得在长轨迹上稳定进行自回归生成成为可能。
English
We present GameNGen, the first game engine powered entirely by a neural model
that enables real-time interaction with a complex environment over long
trajectories at high quality. GameNGen can interactively simulate the classic
game DOOM at over 20 frames per second on a single TPU. Next frame prediction
achieves a PSNR of 29.4, comparable to lossy JPEG compression. Human raters are
only slightly better than random chance at distinguishing short clips of the
game from clips of the simulation. GameNGen is trained in two phases: (1) an
RL-agent learns to play the game and the training sessions are recorded, and
(2) a diffusion model is trained to produce the next frame, conditioned on the
sequence of past frames and actions. Conditioning augmentations enable stable
auto-regressive generation over long trajectories.Summary
AI-Generated Summary