规划、消除和跟踪--语言模型是对具身代理有益的教师

摘要

预训练的大型语言模型（LLMs）捕捉了关于世界的程序化知识。最近的研究利用了LLM生成抽象计划的能力，通过动作评分或动作建模（微调）来简化具有挑战性的控制任务。然而，变压器架构继承了一些限制，使得LLM难以直接作为代理：例如有限的输入长度、微调效率低、来自预训练的偏见，以及与非文本环境的不兼容性。为了与低级可训练的执行者保持兼容性，我们建议利用LLMs中的知识来简化控制问题，而不是解决它。我们提出了计划、消除和跟踪（PET）框架。计划模块将任务描述转换为高级子任务列表。消除模块会从当前子任务的观察中屏蔽不相关的对象和容器。最后，跟踪模块确定代理是否完成了每个子任务。在AlfWorld指令跟随基准测试中，PET框架使得对人类目标规范的泛化显著提高了15%，超过了目前技术水平。

English

Pre-trained large language models (LLMs) capture procedural knowledge about the world. Recent work has leveraged LLM's ability to generate abstract plans to simplify challenging control tasks, either by action scoring, or action modeling (fine-tuning). However, the transformer architecture inherits several constraints that make it difficult for the LLM to directly serve as the agent: e.g. limited input lengths, fine-tuning inefficiency, bias from pre-training, and incompatibility with non-text environments. To maintain compatibility with a low-level trainable actor, we propose to instead use the knowledge in LLMs to simplify the control problem, rather than solving it. We propose the Plan, Eliminate, and Track (PET) framework. The Plan module translates a task description into a list of high-level sub-tasks. The Eliminate module masks out irrelevant objects and receptacles from the observation for the current sub-task. Finally, the Track module determines whether the agent has accomplished each sub-task. On the AlfWorld instruction following benchmark, the PET framework leads to a significant 15% improvement over SOTA for generalization to human goal specifications.

规划、消除和跟踪--语言模型是对具身代理有益的教师

Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents

摘要

Support