OS-Copilot:朝向具有自我改进能力的通用计算机代理的方向。
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
February 12, 2024
作者: Zhiyong Wu, Chengcheng Han, Zichen Ding, Zhenmin Weng, Zhoumianze Liu, Shunyu Yao, Tao Yu, Lingpeng Kong
cs.AI
摘要
计算机的自主交互一直是一个具有巨大潜力的长期挑战,而近年来大型语言模型(LLMs)的大量普及显著加速了构建数字代理的进展。然而,大多数这些代理被设计用于与狭窄领域进行交互,比如特定软件或网站。这种狭窄的焦点限制了它们在一般计算机任务中的适用性。为此,我们引入了OS-Copilot,这是一个用于构建通用代理的框架,能够与操作系统(OS)中的全面元素进行交互,包括网络、代码终端、文件、多媒体和各种第三方应用程序。我们使用OS-Copilot创建了FRIDAY,一个用于自动化一般计算机任务的自我改进的实体代理。在GAIA上,一个通用人工智能助手基准测试中,FRIDAY的表现比以往方法提高了35%,展示了通过从以前任务中积累的技能对未知应用程序具有强大的泛化能力。我们还提供了数字和定量证据表明,FRIDAY学会了在Excel和Powerpoint上进行控制和自我改进,而监督很少。我们的OS-Copilot框架和实证研究结果为未来研究提供了基础设施和见解,以构建更有能力和通用的计算机代理。
English
Autonomous interaction with the computer has been a longstanding challenge
with great potential, and the recent proliferation of large language models
(LLMs) has markedly accelerated progress in building digital agents. However,
most of these agents are designed to interact with a narrow domain, such as a
specific software or website. This narrow focus constrains their applicability
for general computer tasks. To this end, we introduce OS-Copilot, a framework
to build generalist agents capable of interfacing with comprehensive elements
in an operating system (OS), including the web, code terminals, files,
multimedia, and various third-party applications. We use OS-Copilot to create
FRIDAY, a self-improving embodied agent for automating general computer tasks.
On GAIA, a general AI assistants benchmark, FRIDAY outperforms previous methods
by 35%, showcasing strong generalization to unseen applications via accumulated
skills from previous tasks. We also present numerical and quantitative evidence
that FRIDAY learns to control and self-improve on Excel and Powerpoint with
minimal supervision. Our OS-Copilot framework and empirical findings provide
infrastructure and insights for future research toward more capable and
general-purpose computer agents.