AppAgentX: 스마트폰 사용자로서 숙련된 GUI 에이전트로의 진화

초록

대규모 언어 모델(LLMs)의 최근 발전으로 인해 그래픽 사용자 인터페이스(GUIs)와 상호작용할 수 있는 지능형 LLM 기반 에이전트가 개발되었습니다. 이러한 에이전트는 강력한 추론 능력과 적응력을 보여주며, 전통적으로 사전 정의된 규칙이 필요한 복잡한 작업을 수행할 수 있습니다. 그러나 LLM 기반 에이전트가 단계별 추론에 의존하는 경우, 특히 일상적인 작업에서 비효율성이 발생하는 경우가 많습니다. 반면, 전통적인 규칙 기반 시스템은 효율성에서는 뛰어나지만 새로운 시나리오에 적응할 수 있는 지능과 유연성이 부족합니다. 이러한 문제를 해결하기 위해, 우리는 지능과 유연성을 유지하면서 운영 효율성을 향상시키는 GUI 에이전트를 위한 새로운 진화 프레임워크를 제안합니다. 우리의 접근 방식은 에이전트의 작업 실행 기록을 저장하는 메모리 메커니즘을 포함합니다. 이 기록을 분석함으로써, 에이전트는 반복적인 동작 시퀀스를 식별하고, 이러한 저수준 작업을 대체하여 효율성을 개선하는 단축키 역할을 하는 고수준 동작으로 진화합니다. 이를 통해 에이전트는 더 복잡한 추론이 필요한 작업에 집중할 수 있으면서도 일상적인 동작을 단순화할 수 있습니다. 여러 벤치마크 작업에 대한 실험 결과는 우리의 접근 방식이 기존 방법보다 효율성과 정확성 모두에서 크게 우수함을 보여줍니다. 코드는 추가 연구를 지원하기 위해 오픈소스로 공개될 예정입니다.

English

Recent advancements in Large Language Models (LLMs) have led to the development of intelligent LLM-based agents capable of interacting with graphical user interfaces (GUIs). These agents demonstrate strong reasoning and adaptability, enabling them to perform complex tasks that traditionally required predefined rules. However, the reliance on step-by-step reasoning in LLM-based agents often results in inefficiencies, particularly for routine tasks. In contrast, traditional rule-based systems excel in efficiency but lack the intelligence and flexibility to adapt to novel scenarios. To address this challenge, we propose a novel evolutionary framework for GUI agents that enhances operational efficiency while retaining intelligence and flexibility. Our approach incorporates a memory mechanism that records the agent's task execution history. By analyzing this history, the agent identifies repetitive action sequences and evolves high-level actions that act as shortcuts, replacing these low-level operations and improving efficiency. This allows the agent to focus on tasks requiring more complex reasoning, while simplifying routine actions. Experimental results on multiple benchmark tasks demonstrate that our approach significantly outperforms existing methods in both efficiency and accuracy. The code will be open-sourced to support further research.

AppAgentX: 스마트폰 사용자로서 숙련된 GUI 에이전트로의 진화

AppAgentX: Evolving GUI Agents as Proficient Smartphone Users

초록

Support