大型語言模型作為具體任務的通用策略

摘要

我們展示了大型語言模型（LLMs）可以被調整為具有泛化能力的具體視覺任務政策。我們的方法稱為大型語言模型強化學習政策（LLaRP），它調整了一個預先訓練的凍結LLM，以接受文本指令和視覺自我中心觀察作為輸入，並直接在環境中輸出動作。通過強化學習，我們訓練LLaRP僅通過環境交互來觀察和執行。我們展示LLaRP對任務指令的複雜改寫具有魯棒性，並且可以泛化到需要新的最優行為的新任務。特別是，在1,000個未見任務中，它實現了42%的成功率，是其他常見學習基線或LLMs的零-shot應用的成功率的1.7倍。最後，為了幫助社區研究以語言為條件、大規模多任務、具體AI問題，我們發布了一個新的基準，稱為語言重排，包括150,000個訓練任務和1,000個測試任務，用於語言條件的重排。在未見的語言重排指令中，LLaRP的視頻示例可在https://llm-rl.github.io找到。

English

We show that large language models (LLMs) can be adapted to be generalizable policies for embodied visual tasks. Our approach, called Large LAnguage model Reinforcement Learning Policy (LLaRP), adapts a pre-trained frozen LLM to take as input text instructions and visual egocentric observations and output actions directly in the environment. Using reinforcement learning, we train LLaRP to see and act solely through environmental interactions. We show that LLaRP is robust to complex paraphrasings of task instructions and can generalize to new tasks that require novel optimal behavior. In particular, on 1,000 unseen tasks it achieves 42% success rate, 1.7x the success rate of other common learned baselines or zero-shot applications of LLMs. Finally, to aid the community in studying language conditioned, massively multi-task, embodied AI problems we release a novel benchmark, Language Rearrangement, consisting of 150,000 training and 1,000 testing tasks for language-conditioned rearrangement. Video examples of LLaRP in unseen Language Rearrangement instructions are at https://llm-rl.github.io.

大型語言模型作為具體任務的通用策略

Large Language Models as Generalizable Policies for Embodied Tasks

摘要

Support