用语言学习建模世界
Learning to Model the World with Language
July 31, 2023
作者: Jessy Lin, Yuqing Du, Olivia Watkins, Danijar Hafner, Pieter Abbeel, Dan Klein, Anca Dragan
cs.AI
摘要
为了与世界上的人类进行交互,代理需要理解人们使用的各种语言类型,将其与视觉世界联系起来,并根据这些语言采取行动。尽管当前的代理学习执行简单语言指令以获得任务奖励,我们的目标是构建能够利用传达一般知识、描述世界状态、提供互动反馈等多样语言的代理。我们的关键想法是语言帮助代理预测未来:将会被观察到什么,世界将如何行为,哪些情况将受到奖励。这种观点将语言理解与未来预测统一为一个强大的自监督学习目标。我们提出了Dynalang,一个学习多模态世界模型的代理,该模型可以预测未来的文本和图像表示,并学会根据想象的模型展开行动。与仅使用语言预测行动的传统代理不同,Dynalang通过利用过去的语言来预测未来的语言、视频和奖励,获得了丰富的语言理解。除了在环境中进行在线交互学习外,Dynalang还可以在文本、视频或两者的数据集上进行预训练,而无需行动或奖励。从在网格世界中使用语言提示到导航家居的照片级扫描,Dynalang利用各种类型的语言来提高任务性能,包括环境描述、游戏规则和指示。
English
To interact with humans in the world, agents need to understand the diverse
types of language that people use, relate them to the visual world, and act
based on them. While current agents learn to execute simple language
instructions from task rewards, we aim to build agents that leverage diverse
language that conveys general knowledge, describes the state of the world,
provides interactive feedback, and more. Our key idea is that language helps
agents predict the future: what will be observed, how the world will behave,
and which situations will be rewarded. This perspective unifies language
understanding with future prediction as a powerful self-supervised learning
objective. We present Dynalang, an agent that learns a multimodal world model
that predicts future text and image representations and learns to act from
imagined model rollouts. Unlike traditional agents that use language only to
predict actions, Dynalang acquires rich language understanding by using past
language also to predict future language, video, and rewards. In addition to
learning from online interaction in an environment, Dynalang can be pretrained
on datasets of text, video, or both without actions or rewards. From using
language hints in grid worlds to navigating photorealistic scans of homes,
Dynalang utilizes diverse types of language to improve task performance,
including environment descriptions, game rules, and instructions.