AppAgentX: スマートフォン上級ユーザーとして進化するGUIエージェント

要旨

大規模言語モデル（LLMs）の最近の進展により、グラフィカルユーザーインターフェース（GUI）と対話可能なインテリジェントなLLMベースのエージェントの開発が進んでいます。これらのエージェントは、強力な推論能力と適応性を示し、従来は事前定義されたルールを必要としていた複雑なタスクを実行することができます。しかし、LLMベースのエージェントが段階的な推論に依存するため、特に日常的なタスクにおいて非効率性が生じることがあります。一方、従来のルールベースのシステムは効率性に優れていますが、新しいシナリオに適応するための知能や柔軟性に欠けています。この課題に対処するため、我々は、効率性を向上させつつ知能と柔軟性を保持するGUIエージェントのための新しい進化的フレームワークを提案します。我々のアプローチでは、エージェントのタスク実行履歴を記録するメモリメカニズムを組み込んでいます。この履歴を分析することで、エージェントは繰り返し発生するアクションシーケンスを特定し、これらの低レベルの操作を置き換えて効率を向上させるショートカットとして機能する高レベルのアクションを進化させます。これにより、エージェントはより複雑な推論を必要とするタスクに集中しつつ、日常的なアクションを簡素化することができます。複数のベンチマークタスクにおける実験結果は、我々のアプローチが既存の手法を効率性と精度の両面で大幅に上回ることを示しています。コードはオープンソース化され、さらなる研究を支援します。

English

Recent advancements in Large Language Models (LLMs) have led to the development of intelligent LLM-based agents capable of interacting with graphical user interfaces (GUIs). These agents demonstrate strong reasoning and adaptability, enabling them to perform complex tasks that traditionally required predefined rules. However, the reliance on step-by-step reasoning in LLM-based agents often results in inefficiencies, particularly for routine tasks. In contrast, traditional rule-based systems excel in efficiency but lack the intelligence and flexibility to adapt to novel scenarios. To address this challenge, we propose a novel evolutionary framework for GUI agents that enhances operational efficiency while retaining intelligence and flexibility. Our approach incorporates a memory mechanism that records the agent's task execution history. By analyzing this history, the agent identifies repetitive action sequences and evolves high-level actions that act as shortcuts, replacing these low-level operations and improving efficiency. This allows the agent to focus on tasks requiring more complex reasoning, while simplifying routine actions. Experimental results on multiple benchmark tasks demonstrate that our approach significantly outperforms existing methods in both efficiency and accuracy. The code will be open-sourced to support further research.

AppAgentX: スマートフォン上級ユーザーとして進化するGUIエージェント

AppAgentX: Evolving GUI Agents as Proficient Smartphone Users

要旨

Support