具有上下文感知分詞的高效世界模型

摘要

將深度強化學習（RL）方法擴展至更大規模是一項重大挑戰。隨著生成建模的發展，基於模型的RL被視為一個強有力的競爭者。最近在序列建模方面的進展導致了基於Transformer的有效世界模型，儘管需要處理長序列的標記以準確模擬環境，進而帶來了龐大的計算量。在這項工作中，我們提出了Delta-IRIS，一種新型代理，其世界模型架構由一個編碼時間步驟之間隨機增量的離散自編碼器和一個自回歸Transformer組成，後者通過使用連續標記總結當前世界狀態來預測未來的增量。在Crafter基準測試中，Delta-IRIS在多個幀預算下確立了一個新的技術水準，同時比以往基於注意力的方法快上一個數量級。我們在https://github.com/vmicheli/delta-iris 上公開了我們的代碼和模型。

English

Scaling up deep Reinforcement Learning (RL) methods presents a significant challenge. Following developments in generative modelling, model-based RL positions itself as a strong contender. Recent advances in sequence modelling have led to effective transformer-based world models, albeit at the price of heavy computations due to the long sequences of tokens required to accurately simulate environments. In this work, we propose Delta-IRIS, a new agent with a world model architecture composed of a discrete autoencoder that encodes stochastic deltas between time steps and an autoregressive transformer that predicts future deltas by summarizing the current state of the world with continuous tokens. In the Crafter benchmark, Delta-IRIS sets a new state of the art at multiple frame budgets, while being an order of magnitude faster to train than previous attention-based approaches. We release our code and models at https://github.com/vmicheli/delta-iris.

具有上下文感知分詞的高效世界模型

Efficient World Models with Context-Aware Tokenization

摘要

Support