ChatPaper.aiChatPaper

一个交互式智能体基础模型

An Interactive Agent Foundation Model

February 8, 2024
作者: Zane Durante, Bidipta Sarkar, Ran Gong, Rohan Taori, Yusuke Noda, Paul Tang, Ehsan Adeli, Shrinidhi Kowshika Lakshmikanth, Kevin Schulman, Arnold Milstein, Demetri Terzopoulos, Ade Famoti, Noboru Kuno, Ashley Llorens, Hoi Vo, Katsu Ikeuchi, Li Fei-Fei, Jianfeng Gao, Naoki Wake, Qiuyuan Huang
cs.AI

摘要

人工智能系统的发展正从创建静态、特定任务模型转变为动态、基于代理的系统,能够在广泛应用中表现出色。我们提出了一个交互式代理基础模型,采用新颖的多任务代理训练范式,用于跨领域、数据集和任务训练人工智能代理。我们的训练范式统一了多样的预训练策略,包括视觉遮罩自编码器、语言建模和下一步动作预测,实现了多功能和适应性的人工智能框架。我们展示了我们的框架在三个独立领域--机器人技术、游戏人工智能和医疗保健方面的表现。我们的模型展示了其在每个领域生成有意义且具有相关背景的输出的能力。我们方法的优势在于其通用性,利用各种数据源,如机器人序列、游戏数据、大规模视频数据集和文本信息,进行有效的多模态和多任务学习。我们的方法为开发通用、采取行动的多模态系统提供了一个有前景的途径。
English
The development of artificial intelligence systems is transitioning from creating static, task-specific models to dynamic, agent-based systems capable of performing well in a wide range of applications. We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents across a wide range of domains, datasets, and tasks. Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction, enabling a versatile and adaptable AI framework. We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare. Our model demonstrates its ability to generate meaningful and contextually relevant outputs in each area. The strength of our approach lies in its generality, leveraging a variety of data sources such as robotics sequences, gameplay data, large-scale video datasets, and textual information for effective multimodal and multi-task learning. Our approach provides a promising avenue for developing generalist, action-taking, multimodal systems.
PDF304December 15, 2024