MobA: 効率的なモバイルタスク自動化のための二層エージェントシステム

要旨

現在のモバイルアシスタントは、システムAPIへの依存や制限された理解力と意思決定能力による複雑なユーザー指示や多様なインターフェースへの対応に苦労しています。これらの課題に対処するために、私たちは、理解力と計画能力を高める、洗練された2レベルのエージェントアーキテクチャによって動作する新しいモバイルフォンエージェントであるMobAを提案します。高レベルのグローバルエージェント（GA）は、ユーザーコマンドの理解、履歴メモリの追跡、およびタスクの計画を担当します。低レベルのローカルエージェント（LA）は、GAからのサブタスクとメモによって誘導された関数呼び出し形式の詳細なアクションを予測します。反射モジュールを統合することで、効率的なタスク完了が可能となり、以前に見たことのない複雑なタスクに対処できるようになります。MobAは、実生活での評価において、タスクの実行効率と完了率の著しい改善を示し、MLLMによって強化されたモバイルアシスタントの潜在能力を強調しています。

English

Current mobile assistants are limited by dependence on system APIs or struggle with complex user instructions and diverse interfaces due to restricted comprehension and decision-making abilities. To address these challenges, we propose MobA, a novel Mobile phone Agent powered by multimodal large language models that enhances comprehension and planning capabilities through a sophisticated two-level agent architecture. The high-level Global Agent (GA) is responsible for understanding user commands, tracking history memories, and planning tasks. The low-level Local Agent (LA) predicts detailed actions in the form of function calls, guided by sub-tasks and memory from the GA. Integrating a Reflection Module allows for efficient task completion and enables the system to handle previously unseen complex tasks. MobA demonstrates significant improvements in task execution efficiency and completion rate in real-life evaluations, underscoring the potential of MLLM-empowered mobile assistants.

MobA: 効率的なモバイルタスク自動化のための二層エージェントシステム

MobA: A Two-Level Agent System for Efficient Mobile Task Automation

要旨

Support