检索增强决策Transformer:用于上下文中的外部记忆RL
Retrieval-Augmented Decision Transformer: External Memory for In-context RL
October 9, 2024
作者: Thomas Schmied, Fabian Paischer, Vihang Patil, Markus Hofmarcher, Razvan Pascanu, Sepp Hochreiter
cs.AI
摘要
在上下文学习(ICL)中,模型通过观察一些示例在其上下文中学习新任务的能力。虽然在自然语言处理中很常见,但最近也在强化学习(RL)环境中观察到了这种能力。然而,先前的上下文RL方法需要整个代理的上下文中的完整情节。鉴于复杂环境通常导致具有稀疏奖励的长情节,这些方法受限于具有短情节的简单环境。为了解决这些挑战,我们引入了检索增强决策Transformer(RA-DT)。RA-DT采用外部存储器机制来存储过去的经验,从中检索出仅与当前情况相关的子轨迹。RA-DT中的检索组件不需要训练,完全可以是与领域无关的。我们在网格世界环境、机器人模拟和程序生成的视频游戏上评估了RA-DT的能力。在网格世界中,RA-DT优于基线方法,同时仅使用了它们上下文长度的一小部分。此外,我们阐明了当前上下文RL方法在复杂环境中的局限性,并讨论了未来的方向。为了促进未来的研究,我们为考虑的四个环境发布了数据集。
English
In-context learning (ICL) is the ability of a model to learn a new task by
observing a few exemplars in its context. While prevalent in NLP, this
capability has recently also been observed in Reinforcement Learning (RL)
settings. Prior in-context RL methods, however, require entire episodes in the
agent's context. Given that complex environments typically lead to long
episodes with sparse rewards, these methods are constrained to simple
environments with short episodes. To address these challenges, we introduce
Retrieval-Augmented Decision Transformer (RA-DT). RA-DT employs an external
memory mechanism to store past experiences from which it retrieves only
sub-trajectories relevant for the current situation. The retrieval component in
RA-DT does not require training and can be entirely domain-agnostic. We
evaluate the capabilities of RA-DT on grid-world environments, robotics
simulations, and procedurally-generated video games. On grid-worlds, RA-DT
outperforms baselines, while using only a fraction of their context length.
Furthermore, we illuminate the limitations of current in-context RL methods on
complex environments and discuss future directions. To facilitate future
research, we release datasets for four of the considered environments.Summary
AI-Generated Summary