释放预训练语言模型在离线强化学习中的潜力

摘要

离线强化学习（RL）旨在利用预先收集的数据集找到接近最优策略。在现实世界中，数据收集可能既昂贵又有风险；因此，当领域内数据有限时，离线RL变得特别具有挑战性。鉴于大型语言模型（LLMs）及其少样本学习能力的最新进展，本文介绍了一种名为语言模型用于运动控制（LaMo）的通用框架，基于决策Transformer，有效地利用预训练的语言模型（LMs）进行离线RL。我们的框架突出了四个关键组成部分：（1）使用顺序预训练的LMs初始化决策Transformer，（2）采用LoRA微调方法，与完全权重微调相反，以有效结合LMs的预训练知识和领域内知识，（3）使用非线性MLP转换代替线性投影，以生成嵌入，以及（4）在微调过程中集成辅助语言预测损失，以稳定LMs并保留其在语言上的原始能力。实证结果表明，LaMo在稀疏奖励任务中实现了最先进的性能，并在稠密奖励任务中缩小了基于值的离线RL方法和决策Transformer之间的差距。特别是，我们的方法在数据样本有限的情况下表现出优越性能。我们的项目网站是https://lamo2023.github.io

English

Offline reinforcement learning (RL) aims to find a near-optimal policy using pre-collected datasets. In real-world scenarios, data collection could be costly and risky; therefore, offline RL becomes particularly challenging when the in-domain data is limited. Given recent advances in Large Language Models (LLMs) and their few-shot learning prowess, this paper introduces Language Models for Motion Control (LaMo), a general framework based on Decision Transformers to effectively use pre-trained Language Models (LMs) for offline RL. Our framework highlights four crucial components: (1) Initializing Decision Transformers with sequentially pre-trained LMs, (2) employing the LoRA fine-tuning method, in contrast to full-weight fine-tuning, to combine the pre-trained knowledge from LMs and in-domain knowledge effectively, (3) using the non-linear MLP transformation instead of linear projections, to generate embeddings, and (4) integrating an auxiliary language prediction loss during fine-tuning to stabilize the LMs and retain their original abilities on languages. Empirical results indicate LaMo achieves state-of-the-art performance in sparse-reward tasks and closes the gap between value-based offline RL methods and decision transformers in dense-reward tasks. In particular, our method demonstrates superior performance in scenarios with limited data samples. Our project website is https://lamo2023.github.io

释放预训练语言模型在离线强化学习中的潜力

Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning

摘要

Support