擴散用於世界建模:Atari 中的視覺細節至關重要
Diffusion for World Modeling: Visual Details Matter in Atari
May 20, 2024
作者: Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kanervisto, Amos Storkey, Tim Pearce, François Fleuret
cs.AI
摘要
世界模型是一種有前途的方法,可安全且有效地訓練強化學習代理人。最近的世界模型主要基於序列的離散潛變數來模擬環境動態。然而,這種將資訊壓縮為緊湊的離散表示可能忽略了對強化學習重要的視覺細節。與此同時,擴散模型已成為圖像生成的主要方法,挑戰著傳統的建模離散潛變數方法。受這種範式轉變的啟發,我們介紹了DIAMOND(DIffusion As a Model Of eNvironment Dreams),這是一個在擴散世界模型中訓練的強化學習代理人。我們分析了使擴散適合於世界建模所需的關鍵設計選擇,並展示了如何通過改進視覺細節來提高代理人的性能。DIAMOND在具競爭性的Atari 100k基準測試中實現了平均人類標準化分數1.46;這是完全在世界模型內訓練的代理人的最佳表現。為了促進未來對於將擴散應用於世界建模的研究,我們在https://github.com/eloialonso/diamond 上釋出了我們的程式碼、代理人和可玩的世界模型。
English
World models constitute a promising approach for training reinforcement
learning agents in a safe and sample-efficient manner. Recent world models
predominantly operate on sequences of discrete latent variables to model
environment dynamics. However, this compression into a compact discrete
representation may ignore visual details that are important for reinforcement
learning. Concurrently, diffusion models have become a dominant approach for
image generation, challenging well-established methods modeling discrete
latents. Motivated by this paradigm shift, we introduce DIAMOND (DIffusion As a
Model Of eNvironment Dreams), a reinforcement learning agent trained in a
diffusion world model. We analyze the key design choices that are required to
make diffusion suitable for world modeling, and demonstrate how improved visual
details can lead to improved agent performance. DIAMOND achieves a mean human
normalized score of 1.46 on the competitive Atari 100k benchmark; a new best
for agents trained entirely within a world model. To foster future research on
diffusion for world modeling, we release our code, agents and playable world
models at https://github.com/eloialonso/diamond.