下一代嵌入预测技术强化世界模型能力

摘要

在部分可观测的高维领域中，捕捉时序依赖关系对于基于模型的强化学习（MBRL）至关重要。我们提出NE-Dreamer——一种无解码器的MBRL智能体，其利用时序变换器根据潜在状态序列预测下一步的编码器嵌入表示，直接在表征空间中对齐时序预测目标。该方法使NE-Dreamer无需重构损失或辅助监督即可学习具有一致性且可预测的状态表征。在DeepMind控制套件测试中，NE-Dreamer达到或超越了DreamerV3及主流无解码器智能体的性能。在涉及记忆与空间推理的DMLab挑战任务子集上，NE-Dreamer实现了显著性能提升。这些结果表明，基于时序变换器的下一嵌入预测为复杂部分可观测环境中的MBRL提供了一种高效且可扩展的框架。

English

Capturing temporal dependencies is critical for model-based reinforcement learning (MBRL) in partially observable, high-dimensional domains. We introduce NE-Dreamer, a decoder-free MBRL agent that leverages a temporal transformer to predict next-step encoder embeddings from latent state sequences, directly optimizing temporal predictive alignment in representation space. This approach enables NE-Dreamer to learn coherent, predictive state representations without reconstruction losses or auxiliary supervision. On the DeepMind Control Suite, NE-Dreamer matches or exceeds the performance of DreamerV3 and leading decoder-free agents. On a challenging subset of DMLab tasks involving memory and spatial reasoning, NE-Dreamer achieves substantial gains. These results establish next-embedding prediction with temporal transformers as an effective, scalable framework for MBRL in complex, partially observable environments.