MobA:一种用于高效移动任务自动化的双层代理系统
MobA: A Two-Level Agent System for Efficient Mobile Task Automation
October 17, 2024
作者: Zichen Zhu, Hao Tang, Yansi Li, Kunyao Lan, Yixuan Jiang, Hao Zhou, Yixiao Wang, Situo Zhang, Liangtai Sun, Lu Chen, Kai Yu
cs.AI
摘要
当前的移动助手受限于对系统API的依赖,或者由于受限的理解和决策能力而在复杂用户指令和多样界面上遇到困难。为了解决这些挑战,我们提出了MobA,一种由多模态大型语言模型驱动的新型手机代理,通过复杂的两级代理架构增强了理解和规划能力。高级全局代理(GA)负责理解用户命令、跟踪历史记忆和规划任务。低级本地代理(LA)通过子任务和来自GA的记忆,预测以函数调用形式的详细动作。整合反思模块可实现高效的任务完成,使系统能够处理以前未见过的复杂任务。MobA在真实评估中展示了任务执行效率和完成率的显著提升,突显了MLLM增强移动助手潜力的重要性。
English
Current mobile assistants are limited by dependence on system APIs or
struggle with complex user instructions and diverse interfaces due to
restricted comprehension and decision-making abilities. To address these
challenges, we propose MobA, a novel Mobile phone Agent powered by multimodal
large language models that enhances comprehension and planning capabilities
through a sophisticated two-level agent architecture. The high-level Global
Agent (GA) is responsible for understanding user commands, tracking history
memories, and planning tasks. The low-level Local Agent (LA) predicts detailed
actions in the form of function calls, guided by sub-tasks and memory from the
GA. Integrating a Reflection Module allows for efficient task completion and
enables the system to handle previously unseen complex tasks. MobA demonstrates
significant improvements in task execution efficiency and completion rate in
real-life evaluations, underscoring the potential of MLLM-empowered mobile
assistants.Summary
AI-Generated Summary