具有上下文感知分词的高效世界模型

摘要

扩展深度强化学习（RL）方法面临着重大挑战。随着生成建模的发展，基于模型的RL定位为一个强有力的竞争者。最近在序列建模方面取得的进展已经导致了基于Transformer的有效世界模型，尽管由于需要准确模拟环境而产生了大量计算量，这是由于所需的长序列标记。在这项工作中，我们提出了Delta-IRIS，这是一个新的代理程序，其世界模型架构由一个编码时间步长之间的随机增量的离散自动编码器和一个通过用连续标记总结世界当前状态来预测未来增量的自回归Transformer组成。在Crafter基准测试中，Delta-IRIS在多个帧预算上树立了新的技术水平，同时比之前基于注意力的方法快上一个数量级。我们在https://github.com/vmicheli/delta-iris 上发布了我们的代码和模型。

English

Scaling up deep Reinforcement Learning (RL) methods presents a significant challenge. Following developments in generative modelling, model-based RL positions itself as a strong contender. Recent advances in sequence modelling have led to effective transformer-based world models, albeit at the price of heavy computations due to the long sequences of tokens required to accurately simulate environments. In this work, we propose Delta-IRIS, a new agent with a world model architecture composed of a discrete autoencoder that encodes stochastic deltas between time steps and an autoregressive transformer that predicts future deltas by summarizing the current state of the world with continuous tokens. In the Crafter benchmark, Delta-IRIS sets a new state of the art at multiple frame budgets, while being an order of magnitude faster to train than previous attention-based approaches. We release our code and models at https://github.com/vmicheli/delta-iris.

具有上下文感知分词的高效世界模型

Efficient World Models with Context-Aware Tokenization

摘要

Support