大規模言語モデルを用いた自己目的的エージェントの拡張

要旨

人間は、自ら目標を想像し、それを実践することで、開かれた技能のレパートリーを習得する。この自己目的的な学習プロセス、すなわち自己生成された（auto）目標（telos）の追求は、目標が多様化し、抽象的かつ創造的になるにつれて、ますます開かれたものとなる。その結果として可能な技能の空間を探索する行為は、個人間の探索によって支えられている。すなわち、目標の表現は文化的に進化し、特に言語を用いて個人間で伝達される。現在の人工エージェントは、主に事前に定義された目標表現に依存しており、その目標空間は限定されたもの（例：指示のリスト）か、または無制限のもの（例：可能な視覚入力の空間）であるが、目標表現を再構築したり、新しい抽象化を形成したり、創造的な目標を想像する能力を備えていることは稀である。本論文では、事前学習済みの言語モデル（LM）を活用して、多様で抽象的かつ人間に関連する目標の表現、生成、学習を支援する言語モデル拡張自己目的的エージェント（LMA3）を紹介する。LMは、人間の文化的伝達の不完全なモデルとして使用され、人間の常識、直感的な物理学、および全体的な興味の側面を捉える試みである。具体的には、LMA3は自己目的的アーキテクチャの3つの主要なコンポーネントを支援する：1）エージェントの軌跡で達成された目標を記述するリラベラー、2）エージェントが既に習得しているサブゴールに分解された新しい高レベルの目標を提案する目標ジェネレーター、および3）これらの各目標に対する報酬関数である。手動でコーディングされた目標表現、報酬関数、またはカリキュラムに依存することなく、LMA3エージェントがタスクに依存しないテキストベースの環境で多様な技能を習得することを示す。

English

Humans learn to master open-ended repertoires of skills by imagining and practicing their own goals. This autotelic learning process, literally the pursuit of self-generated (auto) goals (telos), becomes more and more open-ended as the goals become more diverse, abstract and creative. The resulting exploration of the space of possible skills is supported by an inter-individual exploration: goal representations are culturally evolved and transmitted across individuals, in particular using language. Current artificial agents mostly rely on predefined goal representations corresponding to goal spaces that are either bounded (e.g. list of instructions), or unbounded (e.g. the space of possible visual inputs) but are rarely endowed with the ability to reshape their goal representations, to form new abstractions or to imagine creative goals. In this paper, we introduce a language model augmented autotelic agent (LMA3) that leverages a pretrained language model (LM) to support the representation, generation and learning of diverse, abstract, human-relevant goals. The LM is used as an imperfect model of human cultural transmission; an attempt to capture aspects of humans' common-sense, intuitive physics and overall interests. Specifically, it supports three key components of the autotelic architecture: 1)~a relabeler that describes the goals achieved in the agent's trajectories, 2)~a goal generator that suggests new high-level goals along with their decomposition into subgoals the agent already masters, and 3)~reward functions for each of these goals. Without relying on any hand-coded goal representations, reward functions or curriculum, we show that LMA3 agents learn to master a large diversity of skills in a task-agnostic text-based environment.

大規模言語モデルを用いた自己目的的エージェントの拡張

Augmenting Autotelic Agents with Large Language Models

要旨

Support