三思而后行：具有内部工作记忆的决策变压器

摘要

基于大型语言模型（LLM）的决策代理已展现出在多个任务间泛化的能力。然而，它们的表现依赖于大量的数据和计算资源。我们认为这种低效源于遗忘现象，即模型通过训练过程中在参数中记忆其行为。因此，在新任务上的训练可能会降低模型在先前任务上的表现。与LLMs的隐式记忆机制相反，人类大脑利用分布式存储记忆，有助于高效管理和组织多种技能，减轻遗忘现象。受此启发，我们提出了一个内部工作记忆模块，用于存储、融合和检索不同下游任务的信息。评估结果显示，所提出的方法提高了在Atari游戏和元世界物体操作任务中的训练效率和泛化能力。此外，我们证明了记忆微调进一步增强了所提架构的适应性。

English

Large language model (LLM)-based decision-making agents have shown the ability to generalize across multiple tasks. However, their performance relies on massive data and compute. We argue that this inefficiency stems from the forgetting phenomenon, in which a model memorizes its behaviors in parameters throughout training. As a result, training on a new task may deteriorate the model's performance on previous tasks. In contrast to LLMs' implicit memory mechanism, the human brain utilizes distributed memory storage, which helps manage and organize multiple skills efficiently, mitigating the forgetting phenomenon. Thus inspired, we propose an internal working memory module to store, blend, and retrieve information for different downstream tasks. Evaluation results show that the proposed method improves training efficiency and generalization in both Atari games and meta-world object manipulation tasks. Moreover, we demonstrate that memory fine-tuning further enhances the adaptability of the proposed architecture.

三思而后行：具有内部工作记忆的决策变压器

Think Before You Act: Decision Transformers with Internal Working Memory

摘要

Support