ChatPaper.aiChatPaper

擴散世界模型

Diffusion World Model

February 5, 2024
作者: Zihan Ding, Amy Zhang, Yuandong Tian, Qinqing Zheng
cs.AI

摘要

我們介紹了擴散世界模型(DWM),一種條件擴散模型,能夠同時預測多步未來狀態和獎勵。與傳統的一步動態模型相比,DWM在單次前向傳遞中提供了長時間預測,消除了對遞歸查詢的需求。我們將DWM整合到基於模型的價值估算中,其中短期回報是通過從DWM中採樣的未來軌跡來模擬的。在離線強化學習的背景下,DWM可以被視為通過生成建模實現保守價值正則化。或者,它可以被看作是一個數據來源,使得離線Q學習能夠使用合成數據。我們在D4RL數據集上的實驗證實了DWM對長時間模擬的穩健性。就絕對性能而言,DWM明顯優於一步動態模型,性能提升了44%,並實現了最先進的性能水平。
English
We introduce Diffusion World Model (DWM), a conditional diffusion model capable of predicting multistep future states and rewards concurrently. As opposed to traditional one-step dynamics models, DWM offers long-horizon predictions in a single forward pass, eliminating the need for recursive quires. We integrate DWM into model-based value estimation, where the short-term return is simulated by future trajectories sampled from DWM. In the context of offline reinforcement learning, DWM can be viewed as a conservative value regularization through generative modeling. Alternatively, it can be seen as a data source that enables offline Q-learning with synthetic data. Our experiments on the D4RL dataset confirm the robustness of DWM to long-horizon simulation. In terms of absolute performance, DWM significantly surpasses one-step dynamics models with a 44% performance gain, and achieves state-of-the-art performance.
PDF81December 15, 2024