利用大型語言模型增強自主代理人

摘要

人類通過想像和實踐自己的目標來掌握開放式技能庫。這種自我目標（auto）追求（telos）的學習過程，隨著目標變得更加多樣化、抽象和創造性，變得越來越開放式。由此產生的探索可能技能空間得到跨個體的探索支持：目標表徵是在個體之間文化演化和傳播的，尤其是使用語言。當前的人工智能代理主要依賴預定義的目標表徵，對應於有界（例如指令列表）或無界（例如可能的視覺輸入空間）的目標空間，但很少具有重新塑造目標表徵、形成新抽象或想像創造性目標的能力。在本文中，我們介紹了一種語言模型增強的自我目標代理（LMA3），該代理利用預訓練的語言模型（LM）來支持多樣化、抽象、與人類相關目標的表徵、生成和學習。LM被用作人類文化傳播的一個不完美模型；試圖捕捉人類常識、直覺物理和整體興趣的一些方面。具體而言，它支持自我目標架構的三個關鍵組件：1）描述代理軌跡中實現的目標的重新標記器，2）提出新的高級目標以及它們分解為代理已掌握的子目標的目標生成器，以及3）這些目標的獎勵函數。在不依賴任何手工編碼的目標表徵、獎勵函數或課程的情況下，我們展示了LMA3代理在一個與任務無關的基於文本的環境中學會掌握各種技能。

English

Humans learn to master open-ended repertoires of skills by imagining and practicing their own goals. This autotelic learning process, literally the pursuit of self-generated (auto) goals (telos), becomes more and more open-ended as the goals become more diverse, abstract and creative. The resulting exploration of the space of possible skills is supported by an inter-individual exploration: goal representations are culturally evolved and transmitted across individuals, in particular using language. Current artificial agents mostly rely on predefined goal representations corresponding to goal spaces that are either bounded (e.g. list of instructions), or unbounded (e.g. the space of possible visual inputs) but are rarely endowed with the ability to reshape their goal representations, to form new abstractions or to imagine creative goals. In this paper, we introduce a language model augmented autotelic agent (LMA3) that leverages a pretrained language model (LM) to support the representation, generation and learning of diverse, abstract, human-relevant goals. The LM is used as an imperfect model of human cultural transmission; an attempt to capture aspects of humans' common-sense, intuitive physics and overall interests. Specifically, it supports three key components of the autotelic architecture: 1)~a relabeler that describes the goals achieved in the agent's trajectories, 2)~a goal generator that suggests new high-level goals along with their decomposition into subgoals the agent already masters, and 3)~reward functions for each of these goals. Without relying on any hand-coded goal representations, reward functions or curriculum, we show that LMA3 agents learn to master a large diversity of skills in a task-agnostic text-based environment.

利用大型語言模型增強自主代理人

Augmenting Autotelic Agents with Large Language Models

摘要

Support