MobiAgent: カスタマイズ可能なモバイルエージェントのための体系的なフレームワーク

要旨

ビジョン・ランゲージモデル（VLM）の急速な進展に伴い、GUIベースのモバイルエージェントは、インテリジェントモバイルシステムの主要な開発方向として登場してきた。しかし、既存のエージェントモデルは、特に精度と効率の面で、現実世界のタスク実行において依然として大きな課題に直面している。これらの制限に対処するため、我々はMobiAgentを提案する。これは、MobiMindシリーズのエージェントモデル、AgentRR高速化フレームワーク、およびMobiFlowベンチマークスイートの3つのコアコンポーネントから構成される包括的なモバイルエージェントシステムである。さらに、現在のモバイルエージェントの能力が高品質なデータの可用性によって制限されていることを認識し、手動アノテーションのコストを大幅に削減するAI支援型のアジャイルデータ収集パイプラインを開発した。汎用LLMおよび専門的なGUIエージェントモデルと比較して、MobiAgentは現実世界のモバイルシナリオにおいて最先端の性能を達成している。

English

With the rapid advancement of Vision-Language Models (VLMs), GUI-based mobile agents have emerged as a key development direction for intelligent mobile systems. However, existing agent models continue to face significant challenges in real-world task execution, particularly in terms of accuracy and efficiency. To address these limitations, we propose MobiAgent, a comprehensive mobile agent system comprising three core components: the MobiMind-series agent models, the AgentRR acceleration framework, and the MobiFlow benchmarking suite. Furthermore, recognizing that the capabilities of current mobile agents are still limited by the availability of high-quality data, we have developed an AI-assisted agile data collection pipeline that significantly reduces the cost of manual annotation. Compared to both general-purpose LLMs and specialized GUI agent models, MobiAgent achieves state-of-the-art performance in real-world mobile scenarios.

MobiAgent: カスタマイズ可能なモバイルエージェントのための体系的なフレームワーク

MobiAgent: A Systematic Framework for Customizable Mobile Agents

要旨

Support