次回埋め込み予測が世界モデルを強化する

要旨

部分観察可能な高次元領域におけるモデルベース強化学習（MBRL）では、時間的依存性の捕捉が重要である。本研究では、デコーダを必要としないMBRLエージェント「NE-Dreamer」を提案する。NE-Dreamerは、時間的トランスフォーマーを利用して潜在状態系列から次のステップのエンコーダ埋め込みを予測し、表現空間における時間的予測整合性を直接最適化する。このアプローチにより、NE-Dreamerは再構成損失や補助的な教師信号を必要とせず、一貫性のある予測的な状態表現を学習することが可能となる。DeepMind Control Suiteにおける実験では、NE-DreamerはDreamerV3および主要なデコーダフリーエージェントの性能に匹敵、あるいはそれを上回る結果を示した。記憶と空間推論を要するDMLabタスクの難易度の高いサブセットでは、NE-Dreamerは大幅な性能向上を達成した。これらの結果は、時間的トランスフォーマーを用いた次埋め込み予測が、複雑で部分観察可能な環境におけるMBRLのための効果的かつスケーラブルなフレームワークであることを示唆している。

English

Capturing temporal dependencies is critical for model-based reinforcement learning (MBRL) in partially observable, high-dimensional domains. We introduce NE-Dreamer, a decoder-free MBRL agent that leverages a temporal transformer to predict next-step encoder embeddings from latent state sequences, directly optimizing temporal predictive alignment in representation space. This approach enables NE-Dreamer to learn coherent, predictive state representations without reconstruction losses or auxiliary supervision. On the DeepMind Control Suite, NE-Dreamer matches or exceeds the performance of DreamerV3 and leading decoder-free agents. On a challenging subset of DMLab tasks involving memory and spatial reasoning, NE-Dreamer achieves substantial gains. These results establish next-embedding prediction with temporal transformers as an effective, scalable framework for MBRL in complex, partially observable environments.

次回埋め込み予測が世界モデルを強化する

Next Embedding Prediction Makes World Models Stronger

要旨

Support