MobiAgent：一个可定制移动代理的系统化框架

摘要

随着视觉-语言模型（VLMs）的迅猛发展，基于图形用户界面（GUI）的移动智能体已成为智能移动系统发展的关键方向。然而，现有智能体模型在执行现实任务时仍面临显著挑战，尤其是在准确性和效率方面。为应对这些局限，我们提出了MobiAgent，一个全面的移动智能体系统，包含三大核心组件：MobiMind系列智能体模型、AgentRR加速框架以及MobiFlow基准测试套件。此外，鉴于当前移动智能体的能力仍受限于高质量数据的可获得性，我们开发了一套AI辅助的敏捷数据收集流程，大幅降低了人工标注的成本。与通用大语言模型（LLMs）及专用GUI智能体模型相比，MobiAgent在真实移动场景中实现了业界领先的性能表现。

English

With the rapid advancement of Vision-Language Models (VLMs), GUI-based mobile agents have emerged as a key development direction for intelligent mobile systems. However, existing agent models continue to face significant challenges in real-world task execution, particularly in terms of accuracy and efficiency. To address these limitations, we propose MobiAgent, a comprehensive mobile agent system comprising three core components: the MobiMind-series agent models, the AgentRR acceleration framework, and the MobiFlow benchmarking suite. Furthermore, recognizing that the capabilities of current mobile agents are still limited by the availability of high-quality data, we have developed an AI-assisted agile data collection pipeline that significantly reduces the cost of manual annotation. Compared to both general-purpose LLMs and specialized GUI agent models, MobiAgent achieves state-of-the-art performance in real-world mobile scenarios.

MobiAgent：一个可定制移动代理的系统化框架

MobiAgent: A Systematic Framework for Customizable Mobile Agents

摘要

Support