ChatPaper.aiChatPaper

釋放預訓練語言模型在離線強化學習中的潛力

Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning

October 31, 2023
作者: Ruizhe Shi, Yuyao Liu, Yanjie Ze, Simon S. Du, Huazhe Xu
cs.AI

摘要

離線強化學習(RL)旨在利用預先收集的數據集找到近乎最優策略。在現實世界情境中,數據收集可能既昂貴又風險高;因此,當領域內數據有限時,離線RL變得特別具挑戰性。鑒於大型語言模型(LLMs)及其少樣本學習能力的最新進展,本文介紹了基於決策Transformer的通用框架,名為語言模型運動控制(LaMo),以有效利用預先訓練的語言模型(LMs)進行離線RL。我們的框架突出了四個關鍵組件:(1)使用順序預先訓練的LMs初始化決策Transformer,(2)採用LoRA微調方法,與全權重微調相對,以有效結合來自LMs的預先訓練知識和領域內知識,(3)使用非線性MLP轉換而非線性投影來生成嵌入,以及(4)在微調期間整合輔助語言預測損失,以穩定LMs並保留其在語言上的原始能力。實證結果表明LaMo在稀獎勵任務中實現了最先進的性能,並在稠密獎勵任務中拉近了基於值的離線RL方法與決策Transformer之間的差距。特別是,我們的方法在數據樣本有限的情境中展現出卓越的性能。我們的項目網站為https://lamo2023.github.io。
English
Offline reinforcement learning (RL) aims to find a near-optimal policy using pre-collected datasets. In real-world scenarios, data collection could be costly and risky; therefore, offline RL becomes particularly challenging when the in-domain data is limited. Given recent advances in Large Language Models (LLMs) and their few-shot learning prowess, this paper introduces Language Models for Motion Control (LaMo), a general framework based on Decision Transformers to effectively use pre-trained Language Models (LMs) for offline RL. Our framework highlights four crucial components: (1) Initializing Decision Transformers with sequentially pre-trained LMs, (2) employing the LoRA fine-tuning method, in contrast to full-weight fine-tuning, to combine the pre-trained knowledge from LMs and in-domain knowledge effectively, (3) using the non-linear MLP transformation instead of linear projections, to generate embeddings, and (4) integrating an auxiliary language prediction loss during fine-tuning to stabilize the LMs and retain their original abilities on languages. Empirical results indicate LaMo achieves state-of-the-art performance in sparse-reward tasks and closes the gap between value-based offline RL methods and decision transformers in dense-reward tasks. In particular, our method demonstrates superior performance in scenarios with limited data samples. Our project website is https://lamo2023.github.io
PDF181December 15, 2024