다음 임베딩 예측으로 강력해지는 세계 모델

초록

시간적 의존성 포착은 부분 관측 가능 고차원 도메인에서 모델 기반 강화 학습(MBRL)의 핵심 요소입니다. 본 연구에서는 잠재 상태 시퀀스로부터 다음 단계 인코더 임베딩을 예측하기 위해 시간적 트랜스포머를 활용하는 디코더 없는 MBRL 에이전트인 NE-Dreamer를 소개합니다. 이 접근법은 표현 공간에서 시간적 예측 정렬을 직접 최적화하여 재구성 손실이나 보조 감독 없이도 일관적이고 예측 가능한 상태 표현을 학습할 수 있게 합니다. DeepMind Control Suite에서 NE-Dreamer는 DreamerV3 및 주요 디코더 없는 에이전트들의 성능을 견줄 뿐만 아니라 능가하는 결과를 보였습니다. 기억과 공간 추론을 요구하는 도전적인 DMLab 작업 하위 집합에서는 상당한 성능 향상을 달성했습니다. 이러한 결과는 시간적 트랜스포머를 이용한 다음 임베딩 예측이 복잡한 부분 관측 가능 환경에서 효과적이고 확장 가능한 MBRL 프레임워크임을 입증합니다.

English

Capturing temporal dependencies is critical for model-based reinforcement learning (MBRL) in partially observable, high-dimensional domains. We introduce NE-Dreamer, a decoder-free MBRL agent that leverages a temporal transformer to predict next-step encoder embeddings from latent state sequences, directly optimizing temporal predictive alignment in representation space. This approach enables NE-Dreamer to learn coherent, predictive state representations without reconstruction losses or auxiliary supervision. On the DeepMind Control Suite, NE-Dreamer matches or exceeds the performance of DreamerV3 and leading decoder-free agents. On a challenging subset of DMLab tasks involving memory and spatial reasoning, NE-Dreamer achieves substantial gains. These results establish next-embedding prediction with temporal transformers as an effective, scalable framework for MBRL in complex, partially observable environments.

다음 임베딩 예측으로 강력해지는 세계 모델

Next Embedding Prediction Makes World Models Stronger

초록

Support