Planejar, Eliminar e Rastrear -- Modelos de Linguagem são Bons Professores para Agentes Corporificados

Resumo

Modelos de linguagem grandes pré-treinados (LLMs) capturam conhecimento procedural sobre o mundo. Trabalhos recentes têm aproveitado a capacidade dos LLMs de gerar planos abstratos para simplificar tarefas de controle desafiadoras, seja por pontuação de ações ou modelagem de ações (fine-tuning). No entanto, a arquitetura transformer herda várias limitações que dificultam o uso direto do LLM como agente: por exemplo, comprimentos de entrada limitados, ineficiência no fine-tuning, viés do pré-treinamento e incompatibilidade com ambientes não textuais. Para manter a compatibilidade com um ator treinável de baixo nível, propomos usar o conhecimento nos LLMs para simplificar o problema de controle, em vez de resolvê-lo diretamente. Propomos o framework Plan, Eliminate, and Track (PET). O módulo Plan traduz uma descrição de tarefa em uma lista de sub-tarefas de alto nível. O módulo Eliminate mascara objetos e recipientes irrelevantes da observação para a sub-tarefa atual. Por fim, o módulo Track determina se o agente concluiu cada sub-tarefa. No benchmark AlfWorld de seguimento de instruções, o framework PET resulta em uma melhoria significativa de 15% em relação ao estado da arte (SOTA) para generalização em especificações de metas humanas.

English

Pre-trained large language models (LLMs) capture procedural knowledge about the world. Recent work has leveraged LLM's ability to generate abstract plans to simplify challenging control tasks, either by action scoring, or action modeling (fine-tuning). However, the transformer architecture inherits several constraints that make it difficult for the LLM to directly serve as the agent: e.g. limited input lengths, fine-tuning inefficiency, bias from pre-training, and incompatibility with non-text environments. To maintain compatibility with a low-level trainable actor, we propose to instead use the knowledge in LLMs to simplify the control problem, rather than solving it. We propose the Plan, Eliminate, and Track (PET) framework. The Plan module translates a task description into a list of high-level sub-tasks. The Eliminate module masks out irrelevant objects and receptacles from the observation for the current sub-task. Finally, the Track module determines whether the agent has accomplished each sub-task. On the AlfWorld instruction following benchmark, the PET framework leads to a significant 15% improvement over SOTA for generalization to human goal specifications.

Planejar, Eliminar e Rastrear -- Modelos de Linguagem são Bons Professores para Agentes Corporificados

Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents

Resumo

Support