ChatPaper.aiChatPaper

一個互動式智能體基礎模型

An Interactive Agent Foundation Model

February 8, 2024
作者: Zane Durante, Bidipta Sarkar, Ran Gong, Rohan Taori, Yusuke Noda, Paul Tang, Ehsan Adeli, Shrinidhi Kowshika Lakshmikanth, Kevin Schulman, Arnold Milstein, Demetri Terzopoulos, Ade Famoti, Noboru Kuno, Ashley Llorens, Hoi Vo, Katsu Ikeuchi, Li Fei-Fei, Jianfeng Gao, Naoki Wake, Qiuyuan Huang
cs.AI

摘要

人工智慧系統的發展正從創建靜態、特定任務模型轉變為動態、基於代理的系統,能夠在廣泛的應用中表現良好。我們提出了一個互動式代理基礎模型,採用新穎的多任務代理訓練範式,用於跨越各種領域、數據集和任務訓練人工智慧代理。我們的訓練範式統一了多樣的預訓練策略,包括視覺遮罩自編碼器、語言建模和下一步行動預測,實現了一個多才多藝且適應性強的人工智慧框架。我們展示了我們的框架在三個獨立領域──機器人技術、遊戲人工智慧和醫療保健方面的表現。我們的模型展示了其在每個領域生成有意義且與上下文相關的輸出的能力。我們方法的優勢在於其通用性,利用各種數據來源,如機器人序列、遊戲數據、大規模視頻數據集和文本信息,進行有效的多模態和多任務學習。我們的方法為發展通才、採取行動的多模態系統提供了一個有前途的途徑。
English
The development of artificial intelligence systems is transitioning from creating static, task-specific models to dynamic, agent-based systems capable of performing well in a wide range of applications. We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents across a wide range of domains, datasets, and tasks. Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction, enabling a versatile and adaptable AI framework. We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare. Our model demonstrates its ability to generate meaningful and contextually relevant outputs in each area. The strength of our approach lies in its generality, leveraging a variety of data sources such as robotics sequences, gameplay data, large-scale video datasets, and textual information for effective multimodal and multi-task learning. Our approach provides a promising avenue for developing generalist, action-taking, multimodal systems.
PDF304December 15, 2024