OS-Copilot: Verso Agenti Informatici Generalisti con Auto-Miglioramento

Abstract

L'interazione autonoma con il computer è stata una sfida di lunga data con un grande potenziale, e la recente proliferazione di modelli linguistici di grandi dimensioni (LLMs) ha accelerato significativamente i progressi nella costruzione di agenti digitali. Tuttavia, la maggior parte di questi agenti è progettata per interagire con un dominio ristretto, come un software o un sito web specifico. Questo focus limitato ne riduce l'applicabilità per compiti informatici generali. A tal fine, introduciamo OS-Copilot, un framework per costruire agenti generalisti in grado di interfacciarsi con elementi completi di un sistema operativo (OS), inclusi il web, terminali di codice, file, multimedia e varie applicazioni di terze parti. Utilizziamo OS-Copilot per creare FRIDAY, un agente incarnato auto-migliorante per l'automazione di compiti informatici generali. Su GAIA, un benchmark per assistenti AI generali, FRIDAY supera i metodi precedenti del 35%, dimostrando una forte generalizzazione a applicazioni non viste grazie alle competenze accumulate da compiti precedenti. Presentiamo inoltre evidenze numeriche e quantitative che dimostrano come FRIDAY impari a controllare e auto-migliorarsi su Excel e Powerpoint con una supervisione minima. Il nostro framework OS-Copilot e i risultati empirici forniscono infrastrutture e intuizioni per future ricerche verso agenti informatici più capaci e a scopo generale.

English

Autonomous interaction with the computer has been a longstanding challenge with great potential, and the recent proliferation of large language models (LLMs) has markedly accelerated progress in building digital agents. However, most of these agents are designed to interact with a narrow domain, such as a specific software or website. This narrow focus constrains their applicability for general computer tasks. To this end, we introduce OS-Copilot, a framework to build generalist agents capable of interfacing with comprehensive elements in an operating system (OS), including the web, code terminals, files, multimedia, and various third-party applications. We use OS-Copilot to create FRIDAY, a self-improving embodied agent for automating general computer tasks. On GAIA, a general AI assistants benchmark, FRIDAY outperforms previous methods by 35%, showcasing strong generalization to unseen applications via accumulated skills from previous tasks. We also present numerical and quantitative evidence that FRIDAY learns to control and self-improve on Excel and Powerpoint with minimal supervision. Our OS-Copilot framework and empirical findings provide infrastructure and insights for future research toward more capable and general-purpose computer agents.

OS-Copilot: Verso Agenti Informatici Generalisti con Auto-Miglioramento

OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

Abstract

Support