扩散世界模型
Diffusion World Model
February 5, 2024
作者: Zihan Ding, Amy Zhang, Yuandong Tian, Qinqing Zheng
cs.AI
摘要
我们介绍了扩散世界模型(DWM),这是一种条件扩散模型,能够同时预测多步未来状态和奖励。与传统的一步动态模型相反,DWM在单次前向传递中提供了长视野的预测,消除了递归查询的需要。我们将DWM集成到基于模型的价值估计中,其中短期回报通过从DWM中采样的未来轨迹进行模拟。在离线强化学习的背景下,DWM可以被视为通过生成建模实现保守价值正则化。或者,它可以被看作是一种数据源,可以使用合成数据进行离线Q学习。我们在D4RL数据集上的实验验证了DWM对长视野模拟的稳健性。在绝对性能方面,DWM明显优于一步动态模型,性能提升了44%,并实现了最先进的性能水平。
English
We introduce Diffusion World Model (DWM), a conditional diffusion model
capable of predicting multistep future states and rewards concurrently. As
opposed to traditional one-step dynamics models, DWM offers long-horizon
predictions in a single forward pass, eliminating the need for recursive
quires. We integrate DWM into model-based value estimation, where the
short-term return is simulated by future trajectories sampled from DWM. In the
context of offline reinforcement learning, DWM can be viewed as a conservative
value regularization through generative modeling. Alternatively, it can be seen
as a data source that enables offline Q-learning with synthetic data. Our
experiments on the D4RL dataset confirm the robustness of DWM to long-horizon
simulation. In terms of absolute performance, DWM significantly surpasses
one-step dynamics models with a 44% performance gain, and achieves
state-of-the-art performance.