檢索增強式決策Transformer:用於上下文中的外部記憶RL
Retrieval-Augmented Decision Transformer: External Memory for In-context RL
October 9, 2024
作者: Thomas Schmied, Fabian Paischer, Vihang Patil, Markus Hofmarcher, Razvan Pascanu, Sepp Hochreiter
cs.AI
摘要
在上下文學習(ICL)中,模型通過觀察其上下文中的少量範例來學習新任務的能力。儘管在自然語言處理(NLP)中很普遍,但最近也在強化學習(RL)環境中觀察到這種能力。然而,在先前的上下文強化學習方法中,需要在代理人的上下文中完整地進行整個情節。鑒於複雜環境通常導致具有稀疏獎勵的長情節,這些方法受限於具有短情節的簡單環境。為應對這些挑戰,我們引入了檢索增強決策Transformer(RA-DT)。RA-DT採用外部記憶機制來存儲過去的經驗,從中檢索僅與當前情況相關的子軌跡。RA-DT中的檢索組件無需訓練,完全與領域無關。我們在網格世界環境、機器人模擬和程序生成的視頻遊戲上評估了RA-DT的能力。在網格世界中,RA-DT優於基準方法,同時僅使用了它們上下文長度的一小部分。此外,我們闡明了當前上下文強化學習方法在複雜環境中的局限性,並討論了未來的方向。為了促進未來的研究,我們釋出了考慮的四個環境的數據集。
English
In-context learning (ICL) is the ability of a model to learn a new task by
observing a few exemplars in its context. While prevalent in NLP, this
capability has recently also been observed in Reinforcement Learning (RL)
settings. Prior in-context RL methods, however, require entire episodes in the
agent's context. Given that complex environments typically lead to long
episodes with sparse rewards, these methods are constrained to simple
environments with short episodes. To address these challenges, we introduce
Retrieval-Augmented Decision Transformer (RA-DT). RA-DT employs an external
memory mechanism to store past experiences from which it retrieves only
sub-trajectories relevant for the current situation. The retrieval component in
RA-DT does not require training and can be entirely domain-agnostic. We
evaluate the capabilities of RA-DT on grid-world environments, robotics
simulations, and procedurally-generated video games. On grid-worlds, RA-DT
outperforms baselines, while using only a fraction of their context length.
Furthermore, we illuminate the limitations of current in-context RL methods on
complex environments and discuss future directions. To facilitate future
research, we release datasets for four of the considered environments.Summary
AI-Generated Summary