컨텍스트 인식 토큰화를 통한 효율적인 월드 모델

초록

심층 강화 학습(Deep Reinforcement Learning, RL) 방법의 규모 확장은 상당한 도전 과제로 남아 있습니다. 생성 모델링 분야의 발전에 이어, 모델 기반 RL은 강력한 경쟁자로 자리 잡고 있습니다. 최근 시퀀스 모델링의 발전으로 인해 효과적인 트랜스포머 기반 세계 모델이 등장했지만, 환경을 정확하게 시뮬레이션하기 위해 필요한 긴 토큰 시퀀스로 인해 계산 부담이 크다는 단점이 있습니다. 본 연구에서는 Delta-IRIS라는 새로운 에이전트를 제안합니다. 이 에이전트는 시간 단계 간의 확률적 델타를 인코딩하는 이산 오토인코더와 현재 세계 상태를 연속 토큰으로 요약하여 미래 델타를 예측하는 자기회귀 트랜스포머로 구성된 세계 모델 아키텍처를 갖추고 있습니다. Crafter 벤치마크에서 Delta-IRIS는 다양한 프레임 예산에서 새로운 최고 성능을 달성했으며, 이전의 주의 기반 접근법보다 훈련 속도가 한 차례 빠릅니다. 우리는 코드와 모델을 https://github.com/vmicheli/delta-iris에서 공개합니다.

English

Scaling up deep Reinforcement Learning (RL) methods presents a significant challenge. Following developments in generative modelling, model-based RL positions itself as a strong contender. Recent advances in sequence modelling have led to effective transformer-based world models, albeit at the price of heavy computations due to the long sequences of tokens required to accurately simulate environments. In this work, we propose Delta-IRIS, a new agent with a world model architecture composed of a discrete autoencoder that encodes stochastic deltas between time steps and an autoregressive transformer that predicts future deltas by summarizing the current state of the world with continuous tokens. In the Crafter benchmark, Delta-IRIS sets a new state of the art at multiple frame budgets, while being an order of magnitude faster to train than previous attention-based approaches. We release our code and models at https://github.com/vmicheli/delta-iris.

컨텍스트 인식 토큰화를 통한 효율적인 월드 모델

Efficient World Models with Context-Aware Tokenization

초록

Support