ChatPaper.aiChatPaper

Agent S:一个开放的代理框架,像人类一样使用计算机。

Agent S: An Open Agentic Framework that Uses Computers Like a Human

October 10, 2024
作者: Saaket Agashe, Jiuzhou Han, Shuyu Gan, Jiachen Yang, Ang Li, Xin Eric Wang
cs.AI

摘要

我们介绍Agent S,这是一个开放的主体框架,通过图形用户界面(GUI)实现与计算机的自主交互,旨在通过自动化复杂的多步任务来改变人机交互。Agent S旨在解决自动化计算机任务中的三个关键挑战:获取领域特定知识、规划长期任务视角以及处理动态、非统一的界面。为此,Agent S引入了经验增强的分层规划,通过在多个层次上从外部知识搜索和内部经验检索中学习,促进有效的任务规划和子任务执行。此外,它采用了一个Agent-Computer Interface(ACI),以更好地引出基于多模态大语言模型(MLLMs)的GUI代理的推理和控制能力。在OSWorld基准测试中的评估显示,Agent S在成功率上优于基准线9.37%(相对改进83.6%),达到了新的最先进水平。全面分析突出了各个组件的有效性,并为未来改进提供了见解。此外,Agent S在新发布的WindowsAgentArena基准测试中展示了对不同操作系统的广泛泛化能力。代码可在https://github.com/simular-ai/Agent-S找到。
English
We present Agent S, an open agentic framework that enables autonomous interaction with computers through a Graphical User Interface (GUI), aimed at transforming human-computer interaction by automating complex, multi-step tasks. Agent S aims to address three key challenges in automating computer tasks: acquiring domain-specific knowledge, planning over long task horizons, and handling dynamic, non-uniform interfaces. To this end, Agent S introduces experience-augmented hierarchical planning, which learns from external knowledge search and internal experience retrieval at multiple levels, facilitating efficient task planning and subtask execution. In addition, it employs an Agent-Computer Interface (ACI) to better elicit the reasoning and control capabilities of GUI agents based on Multimodal Large Language Models (MLLMs). Evaluation on the OSWorld benchmark shows that Agent S outperforms the baseline by 9.37% on success rate (an 83.6% relative improvement) and achieves a new state-of-the-art. Comprehensive analysis highlights the effectiveness of individual components and provides insights for future improvements. Furthermore, Agent S demonstrates broad generalizability to different operating systems on a newly-released WindowsAgentArena benchmark. Code available at https://github.com/simular-ai/Agent-S.

Summary

AI-Generated Summary

PDF242November 16, 2024