MobiAgent: 맞춤형 모바일 에이전트를 위한 체계적 프레임워크

초록

비전-언어 모델(Vision-Language Models, VLMs)의 급속한 발전과 함께, GUI 기반 모바일 에이전트는 지능형 모바일 시스템의 주요 개발 방향으로 부상하고 있다. 그러나 기존의 에이전트 모델들은 실제 작업 실행에서 정확성과 효율성 측면에서 여전히 상당한 어려움에 직면해 있다. 이러한 한계를 극복하기 위해, 우리는 MobiMind 시리즈 에이전트 모델, AgentRR 가속화 프레임워크, MobiFlow 벤치마킹 스위트로 구성된 포괄적인 모바일 에이전트 시스템인 MobiAgent를 제안한다. 또한, 현재 모바일 에이전트의 능력이 고품질 데이터의 가용성에 의해 제한된다는 점을 인식하여, 수동 주석 비용을 크게 줄이는 AI 지원의 민첩한 데이터 수집 파이프라인을 개발하였다. 범용 LLM 및 특화된 GUI 에이전트 모델과 비교했을 때, MobiAgent는 실제 모바일 시나리오에서 최첨단 성능을 달성한다.

English

With the rapid advancement of Vision-Language Models (VLMs), GUI-based mobile agents have emerged as a key development direction for intelligent mobile systems. However, existing agent models continue to face significant challenges in real-world task execution, particularly in terms of accuracy and efficiency. To address these limitations, we propose MobiAgent, a comprehensive mobile agent system comprising three core components: the MobiMind-series agent models, the AgentRR acceleration framework, and the MobiFlow benchmarking suite. Furthermore, recognizing that the capabilities of current mobile agents are still limited by the availability of high-quality data, we have developed an AI-assisted agile data collection pipeline that significantly reduces the cost of manual annotation. Compared to both general-purpose LLMs and specialized GUI agent models, MobiAgent achieves state-of-the-art performance in real-world mobile scenarios.

MobiAgent: 맞춤형 모바일 에이전트를 위한 체계적 프레임워크

MobiAgent: A Systematic Framework for Customizable Mobile Agents

초록

Support