ChatPaper.aiChatPaper

零-shot基于大型语言模型的模型强化学习

Zero-shot Model-based Reinforcement Learning using Large Language Models

October 15, 2024
作者: Abdelhakim Benechehab, Youssef Attia El Hili, Ambroise Odonnat, Oussama Zekri, Albert Thomas, Giuseppe Paolo, Maurizio Filippone, Ievgen Redko, Balázs Kégl
cs.AI

摘要

大型语言模型(LLMs)新兴的零-shot能力已经使它们的应用领域扩展到远远超出自然语言处理任务的范畴。在强化学习中,虽然LLMs在基于文本的环境中得到广泛应用,但它们与连续状态空间的整合仍未得到充分研究。本文研究了如何利用预训练的LLMs来预测连续马尔可夫决策过程的动态。我们确定处理多变量数据和整合控制信号是限制LLMs在这一设置中部署潜力的关键挑战,并提出了“解耦上下文学习”(DICL)来解决这些问题。我们在两个强化学习设置中展示了概念验证应用:基于模型的策略评估和数据增强的离线策略强化学习,并支持所提出方法的理论分析。我们的实验进一步证明了我们的方法产生了良好校准的不确定性估计。我们在https://github.com/abenechehab/dicl 上发布了代码。
English
The emerging zero-shot capabilities of Large Language Models (LLMs) have led to their applications in areas extending well beyond natural language processing tasks. In reinforcement learning, while LLMs have been extensively used in text-based environments, their integration with continuous state spaces remains understudied. In this paper, we investigate how pre-trained LLMs can be leveraged to predict in context the dynamics of continuous Markov decision processes. We identify handling multivariate data and incorporating the control signal as key challenges that limit the potential of LLMs' deployment in this setup and propose Disentangled In-Context Learning (DICL) to address them. We present proof-of-concept applications in two reinforcement learning settings: model-based policy evaluation and data-augmented off-policy reinforcement learning, supported by theoretical analysis of the proposed methods. Our experiments further demonstrate that our approach produces well-calibrated uncertainty estimates. We release the code at https://github.com/abenechehab/dicl.

Summary

AI-Generated Summary

PDF94November 16, 2024