具有上下文感知分词的高效世界模型
Efficient World Models with Context-Aware Tokenization
June 27, 2024
作者: Vincent Micheli, Eloi Alonso, François Fleuret
cs.AI
摘要
扩展深度强化学习(RL)方法面临着重大挑战。随着生成建模的发展,基于模型的RL定位为一个强有力的竞争者。最近在序列建模方面取得的进展已经导致了基于Transformer的有效世界模型,尽管由于需要准确模拟环境而产生了大量计算量,这是由于所需的长序列标记。在这项工作中,我们提出了Delta-IRIS,这是一个新的代理程序,其世界模型架构由一个编码时间步长之间的随机增量的离散自动编码器和一个通过用连续标记总结世界当前状态来预测未来增量的自回归Transformer组成。在Crafter基准测试中,Delta-IRIS在多个帧预算上树立了新的技术水平,同时比之前基于注意力的方法快上一个数量级。我们在https://github.com/vmicheli/delta-iris 上发布了我们的代码和模型。
English
Scaling up deep Reinforcement Learning (RL) methods presents a significant
challenge. Following developments in generative modelling, model-based RL
positions itself as a strong contender. Recent advances in sequence modelling
have led to effective transformer-based world models, albeit at the price of
heavy computations due to the long sequences of tokens required to accurately
simulate environments. In this work, we propose Delta-IRIS, a new agent with
a world model architecture composed of a discrete autoencoder that encodes
stochastic deltas between time steps and an autoregressive transformer that
predicts future deltas by summarizing the current state of the world with
continuous tokens. In the Crafter benchmark, Delta-IRIS sets a new state of
the art at multiple frame budgets, while being an order of magnitude faster to
train than previous attention-based approaches. We release our code and models
at https://github.com/vmicheli/delta-iris.Summary
AI-Generated Summary