世界建模中的扩散:Atari 中的视觉细节至关重要
Diffusion for World Modeling: Visual Details Matter in Atari
May 20, 2024
作者: Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kanervisto, Amos Storkey, Tim Pearce, François Fleuret
cs.AI
摘要
世界模型构成了一种有前途的方法,可以以安全且高效的方式训练强化学习代理。最近的世界模型主要通过序列化的离散潜变量来模拟环境动态。然而,这种压缩成紧凑的离散表示可能会忽略对强化学习重要的视觉细节。与此同时,扩散模型已成为图像生成的主要方法,挑战着传统的建模离散潜变量的方法。受这种范式转变的启发,我们引入了DIAMOND(DIffusion As a Model Of eNvironment Dreams),这是一个在扩散世界模型中训练的强化学习代理。我们分析了使扩散适用于世界建模所需的关键设计选择,并展示了如何通过改进的视觉细节可以提高代理的性能。DIAMOND在具有竞争力的Atari 100k基准测试中实现了平均人类标准化得分为1.46;这是完全在世界模型中训练的代理的新记录。为了促进未来关于将扩散用于世界建模的研究,我们在https://github.com/eloialonso/diamond 上发布了我们的代码、代理和可玩世界模型。
English
World models constitute a promising approach for training reinforcement
learning agents in a safe and sample-efficient manner. Recent world models
predominantly operate on sequences of discrete latent variables to model
environment dynamics. However, this compression into a compact discrete
representation may ignore visual details that are important for reinforcement
learning. Concurrently, diffusion models have become a dominant approach for
image generation, challenging well-established methods modeling discrete
latents. Motivated by this paradigm shift, we introduce DIAMOND (DIffusion As a
Model Of eNvironment Dreams), a reinforcement learning agent trained in a
diffusion world model. We analyze the key design choices that are required to
make diffusion suitable for world modeling, and demonstrate how improved visual
details can lead to improved agent performance. DIAMOND achieves a mean human
normalized score of 1.46 on the competitive Atari 100k benchmark; a new best
for agents trained entirely within a world model. To foster future research on
diffusion for world modeling, we release our code, agents and playable world
models at https://github.com/eloialonso/diamond.Summary
AI-Generated Summary