利用大型语言模型增强自主智能体
Augmenting Autotelic Agents with Large Language Models
May 21, 2023
作者: Cédric Colas, Laetitia Teodorescu, Pierre-Yves Oudeyer, Xingdi Yuan, Marc-Alexandre Côté
cs.AI
摘要
人类通过想象和实践自己的目标来掌握开放式技能库。这种自体学习过程,字面上是追求自生成的(auto)目标(telos),随着目标变得更加多样化、抽象和创造性,变得越来越开放式。由此产生的对可能技能空间的探索得到了跨个体探索的支持:目标表征是在个体之间文化进化并传播的,尤其是使用语言。当前的人工智能代理主要依赖于预定义的目标表征,对应于要么是有界的目标空间(例如指令列表),要么是无界的目标空间(例如可能的视觉输入空间),但很少具备重塑其目标表征、形成新抽象或想象创造性目标的能力。在本文中,我们介绍了一种增强型自体学习代理(LMA3)语言模型,利用预训练的语言模型(LM)来支持多样化、抽象、与人类相关的目标的表征、生成和学习。LM被用作人类文化传播的不完美模型;试图捕捉人类常识、直觉物理和整体兴趣的方面。具体来说,它支持自体架构的三个关键组件:1)描述代理轨迹中实现的目标的重新标记器,2)提出新的高层目标以及它们分解为代理已掌握的子目标的目标生成器,以及3)每个目标的奖励函数。在不依赖任何手工编码的目标表征、奖励函数或课程的情况下,我们展示了LMA3代理在基于文本的任务不可知环境中学会掌握大量多样的技能。
English
Humans learn to master open-ended repertoires of skills by imagining and
practicing their own goals. This autotelic learning process, literally the
pursuit of self-generated (auto) goals (telos), becomes more and more
open-ended as the goals become more diverse, abstract and creative. The
resulting exploration of the space of possible skills is supported by an
inter-individual exploration: goal representations are culturally evolved and
transmitted across individuals, in particular using language. Current
artificial agents mostly rely on predefined goal representations corresponding
to goal spaces that are either bounded (e.g. list of instructions), or
unbounded (e.g. the space of possible visual inputs) but are rarely endowed
with the ability to reshape their goal representations, to form new
abstractions or to imagine creative goals. In this paper, we introduce a
language model augmented autotelic agent (LMA3) that leverages a pretrained
language model (LM) to support the representation, generation and learning of
diverse, abstract, human-relevant goals. The LM is used as an imperfect model
of human cultural transmission; an attempt to capture aspects of humans'
common-sense, intuitive physics and overall interests. Specifically, it
supports three key components of the autotelic architecture: 1)~a relabeler
that describes the goals achieved in the agent's trajectories, 2)~a goal
generator that suggests new high-level goals along with their decomposition
into subgoals the agent already masters, and 3)~reward functions for each of
these goals. Without relying on any hand-coded goal representations, reward
functions or curriculum, we show that LMA3 agents learn to master a large
diversity of skills in a task-agnostic text-based environment.