規劃、消除和追蹤--語言模型是具體代理人的良師。

摘要

預先訓練的大型語言模型（LLMs）捕捉了關於世界的程序性知識。最近的研究利用了LLM生成抽象計劃的能力，通過動作評分或動作建模（微調）來簡化具有挑戰性的控制任務。然而，變壓器架構繼承了幾個限制，使得LLM難以直接作為代理器：例如有限的輸入長度、微調效率低下、來自預訓練的偏見，以及與非文本環境不相容。為了與低層可訓練的行為器保持兼容性，我們建議改用LLMs中的知識來簡化控制問題，而不是解決它。我們提出了計劃、消除和跟踪（PET）框架。計劃模塊將任務描述轉換為高級子任務列表。消除模塊從當前子任務的觀察中遮罩掉不相關的對象和容器。最後，跟踪模塊確定代理器是否完成了每個子任務。在AlfWorld指令跟隨基準測試中，PET框架對於通用化到人類目標規格實現了顯著的15%改進。

English

Pre-trained large language models (LLMs) capture procedural knowledge about the world. Recent work has leveraged LLM's ability to generate abstract plans to simplify challenging control tasks, either by action scoring, or action modeling (fine-tuning). However, the transformer architecture inherits several constraints that make it difficult for the LLM to directly serve as the agent: e.g. limited input lengths, fine-tuning inefficiency, bias from pre-training, and incompatibility with non-text environments. To maintain compatibility with a low-level trainable actor, we propose to instead use the knowledge in LLMs to simplify the control problem, rather than solving it. We propose the Plan, Eliminate, and Track (PET) framework. The Plan module translates a task description into a list of high-level sub-tasks. The Eliminate module masks out irrelevant objects and receptacles from the observation for the current sub-task. Finally, the Track module determines whether the agent has accomplished each sub-task. On the AlfWorld instruction following benchmark, the PET framework leads to a significant 15% improvement over SOTA for generalization to human goal specifications.

規劃、消除和追蹤--語言模型是具體代理人的良師。

Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents

摘要

Support