在许多模拟世界中扩展可指导代理
Scaling Instructable Agents Across Many Simulated Worlds
March 13, 2024
作者: SIMA Team, Maria Abi Raad, Arun Ahuja, Catarina Barros, Frederic Besse, Andrew Bolt, Adrian Bolton, Bethanie Brownfield, Gavin Buttimore, Max Cant, Sarah Chakera, Stephanie C. Y. Chan, Jeff Clune, Adrian Collister, Vikki Copeman, Alex Cullum, Ishita Dasgupta, Dario de Cesare, Julia Di Trapani, Yani Donchev, Emma Dunleavy, Martin Engelcke, Ryan Faulkner, Frankie Garcia, Charles Gbadamosi, Zhitao Gong, Lucy Gonzales, Karol Gregor, Arne Olav Hallingstad, Tim Harley, Sam Haves, Felix Hill, Ed Hirst, Drew A. Hudson, Steph Hughes-Fitt, Danilo J. Rezende, Mimi Jasarevic, Laura Kampis, Rosemary Ke, Thomas Keck, Junkyung Kim, Oscar Knagg, Kavya Kopparapu, Andrew Lampinen, Shane Legg, Alexander Lerchner, Marjorie Limont, Yulan Liu, Maria Loks-Thompson, Joseph Marino, Kathryn Martin Cussons, Loic Matthey, Siobhan Mcloughlin, Piermaria Mendolicchio, Hamza Merzic, Anna Mitenkova, Alexandre Moufarek, Valeria Oliveira, Yanko Oliveira, Hannah Openshaw, Renke Pan, Aneesh Pappu, Alex Platonov, Ollie Purkiss, David Reichert, John Reid, Pierre Harvey Richemond, Tyson Roberts, Giles Ruscoe, Jaume Sanchez Elias, Tasha Sandars, Daniel P. Sawyer, Tim Scholtes, Guy Simmons, Daniel Slater, Hubert Soyer, Heiko Strathmann, Peter Stys, Allison C. Tam, Denis Teplyashin, Tayfun Terzi, Davide Vercelli, Bojan Vujatovic, Marcus Wainwright, Jane X. Wang, Zhengdong Wang, Daan Wierstra, Duncan Williams, Nathaniel Wong, Sarah York, Nick Young
cs.AI
摘要
构建具有实体的人工智能系统,能够在任何3D环境中遵循任意语言指令,是创造通用人工智能的关键挑战。实现这一目标需要学会将语言与感知和实体动作相结合,以完成复杂任务。可扩展、可指导、多世界代理(SIMA)项目通过训练代理程序在各种虚拟3D环境中遵循自由形式的指令来应对这一挑战,包括策划研究环境以及开放式的商业视频游戏。我们的目标是开发一个可指导的代理程序,能够在任何模拟的3D环境中完成人类能够做到的任何事情。我们的方法侧重于以语言驱动的普适性为重点,同时减少最小的假设。我们的代理程序使用通用的、类似人类的接口实时与环境交互:输入为图像观察和语言指令,输出为键盘和鼠标操作。这种通用方法具有挑战性,但它使代理程序能够在许多视觉复杂且语义丰富的环境中将语言落地,同时也使我们能够轻松地在新环境中运行代理程序。在本文中,我们描述了我们的动机和目标,我们已经取得的初步进展,以及在几个不同的研究环境和各种商业视频游戏中取得的有前景的初步结果。
English
Building embodied AI systems that can follow arbitrary language instructions
in any 3D environment is a key challenge for creating general AI. Accomplishing
this goal requires learning to ground language in perception and embodied
actions, in order to accomplish complex tasks. The Scalable, Instructable,
Multiworld Agent (SIMA) project tackles this by training agents to follow
free-form instructions across a diverse range of virtual 3D environments,
including curated research environments as well as open-ended, commercial video
games. Our goal is to develop an instructable agent that can accomplish
anything a human can do in any simulated 3D environment. Our approach focuses
on language-driven generality while imposing minimal assumptions. Our agents
interact with environments in real-time using a generic, human-like interface:
the inputs are image observations and language instructions and the outputs are
keyboard-and-mouse actions. This general approach is challenging, but it allows
agents to ground language across many visually complex and semantically rich
environments while also allowing us to readily run agents in new environments.
In this paper we describe our motivation and goal, the initial progress we have
made, and promising preliminary results on several diverse research
environments and a variety of commercial video games.Summary
AI-Generated Summary