ChatPaper.aiChatPaper

释放预训练语言模型在离线强化学习中的潜力

Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning

October 31, 2023
作者: Ruizhe Shi, Yuyao Liu, Yanjie Ze, Simon S. Du, Huazhe Xu
cs.AI

摘要

离线强化学习(RL)旨在利用预先收集的数据集找到接近最优策略。在现实世界中,数据收集可能既昂贵又有风险;因此,当领域内数据有限时,离线RL变得特别具有挑战性。鉴于大型语言模型(LLMs)及其少样本学习能力的最新进展,本文介绍了一种名为语言模型用于运动控制(LaMo)的通用框架,基于决策Transformer,有效地利用预训练的语言模型(LMs)进行离线RL。我们的框架突出了四个关键组成部分:(1)使用顺序预训练的LMs初始化决策Transformer,(2)采用LoRA微调方法,与完全权重微调相反,以有效结合LMs的预训练知识和领域内知识,(3)使用非线性MLP转换代替线性投影,以生成嵌入,以及(4)在微调过程中集成辅助语言预测损失,以稳定LMs并保留其在语言上的原始能力。实证结果表明,LaMo在稀疏奖励任务中实现了最先进的性能,并在稠密奖励任务中缩小了基于值的离线RL方法和决策Transformer之间的差距。特别是,我们的方法在数据样本有限的情况下表现出优越性能。我们的项目网站是https://lamo2023.github.io
English
Offline reinforcement learning (RL) aims to find a near-optimal policy using pre-collected datasets. In real-world scenarios, data collection could be costly and risky; therefore, offline RL becomes particularly challenging when the in-domain data is limited. Given recent advances in Large Language Models (LLMs) and their few-shot learning prowess, this paper introduces Language Models for Motion Control (LaMo), a general framework based on Decision Transformers to effectively use pre-trained Language Models (LMs) for offline RL. Our framework highlights four crucial components: (1) Initializing Decision Transformers with sequentially pre-trained LMs, (2) employing the LoRA fine-tuning method, in contrast to full-weight fine-tuning, to combine the pre-trained knowledge from LMs and in-domain knowledge effectively, (3) using the non-linear MLP transformation instead of linear projections, to generate embeddings, and (4) integrating an auxiliary language prediction loss during fine-tuning to stabilize the LMs and retain their original abilities on languages. Empirical results indicate LaMo achieves state-of-the-art performance in sparse-reward tasks and closes the gap between value-based offline RL methods and decision transformers in dense-reward tasks. In particular, our method demonstrates superior performance in scenarios with limited data samples. Our project website is https://lamo2023.github.io
PDF181December 15, 2024