大型语言模型作为具有泛化能力的具体任务策略
Large Language Models as Generalizable Policies for Embodied Tasks
October 26, 2023
作者: Andrew Szot, Max Schwarzer, Harsh Agrawal, Bogdan Mazoure, Walter Talbott, Katherine Metcalf, Natalie Mackraz, Devon Hjelm, Alexander Toshev
cs.AI
摘要
我们展示了大型语言模型(LLMs)可以被调整为适用于具身视觉任务的通用策略。我们的方法称为大型语言模型强化学习策略(LLaRP),它调整了一个预训练的冻结LLM,以接受文本指令和视觉自我中心观察作为输入,并直接在环境中输出动作。通过强化学习,我们训练LLaRP仅通过环境交互来观察和行动。我们展示LLaRP对任务指令的复杂释义具有鲁棒性,并且可以推广到需要新颖最佳行为的新任务。特别是,在1,000个未见任务中,它实现了42%的成功率,是其他常见学习基线或LLMs的零-shot应用成功率的1.7倍。最后,为了帮助社区研究以语言为条件的、大规模多任务的具身人工智能问题,我们发布了一个新的基准,语言重排,包括150,000个训练任务和1,000个测试任务,用于语言条件的重排。LLaRP在未见的语言重排指令中的视频示例可在https://llm-rl.github.io中找到。
English
We show that large language models (LLMs) can be adapted to be generalizable
policies for embodied visual tasks. Our approach, called Large LAnguage model
Reinforcement Learning Policy (LLaRP), adapts a pre-trained frozen LLM to take
as input text instructions and visual egocentric observations and output
actions directly in the environment. Using reinforcement learning, we train
LLaRP to see and act solely through environmental interactions. We show that
LLaRP is robust to complex paraphrasings of task instructions and can
generalize to new tasks that require novel optimal behavior. In particular, on
1,000 unseen tasks it achieves 42% success rate, 1.7x the success rate of other
common learned baselines or zero-shot applications of LLMs. Finally, to aid the
community in studying language conditioned, massively multi-task, embodied AI
problems we release a novel benchmark, Language Rearrangement, consisting of
150,000 training and 1,000 testing tasks for language-conditioned
rearrangement. Video examples of LLaRP in unseen Language Rearrangement
instructions are at https://llm-rl.github.io.