OS-Copilot：朝向具有自我改進能力的通用計算機代理程式

摘要

與電腦的自主互動一直是一個具有巨大潛力的長期挑戰，近年來大型語言模型（LLMs）的大量應用顯著加速了建立數位代理人的進展。然而，大多數這些代理人被設計來與狹窄領域進行互動，例如特定軟體或網站。這種狹隘的焦點限制了它們應用於一般電腦任務的能力。為此，我們引入了OS-Copilot，一個建立通用代理人框架，能夠與作業系統（OS）中的全面元素進行接口連接，包括網頁、程式碼終端、檔案、多媒體和各種第三方應用程式。我們使用OS-Copilot來創建FRIDAY，一個用於自動化一般電腦任務的自我改進實體代理人。在GAIA通用AI助手基準測試中，FRIDAY的表現優於先前方法35％，展示了對未見應用的強大泛化能力，通過從先前任務中積累的技能。我們還提供了數字和量化證據，顯示FRIDAY學會在Excel和Powerpoint上進行控制和自我改進，而監督極少。我們的OS-Copilot框架和實證結果為未來研究提供了基礎和見解，以建立更具能力和通用性的電腦代理人。

English

Autonomous interaction with the computer has been a longstanding challenge with great potential, and the recent proliferation of large language models (LLMs) has markedly accelerated progress in building digital agents. However, most of these agents are designed to interact with a narrow domain, such as a specific software or website. This narrow focus constrains their applicability for general computer tasks. To this end, we introduce OS-Copilot, a framework to build generalist agents capable of interfacing with comprehensive elements in an operating system (OS), including the web, code terminals, files, multimedia, and various third-party applications. We use OS-Copilot to create FRIDAY, a self-improving embodied agent for automating general computer tasks. On GAIA, a general AI assistants benchmark, FRIDAY outperforms previous methods by 35%, showcasing strong generalization to unseen applications via accumulated skills from previous tasks. We also present numerical and quantitative evidence that FRIDAY learns to control and self-improve on Excel and Powerpoint with minimal supervision. Our OS-Copilot framework and empirical findings provide infrastructure and insights for future research toward more capable and general-purpose computer agents.

OS-Copilot：朝向具有自我改進能力的通用計算機代理程式

OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

摘要

Support