移动代理-V:通过视频引导的多代理协作学习移动设备操作
Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration
February 24, 2025
作者: Junyang Wang, Haiyang Xu, Xi Zhang, Ming Yan, Ji Zhang, Fei Huang, Jitao Sang
cs.AI
摘要
隨著移動設備使用量的快速增長,提升自動化水平以實現無縫任務管理變得至關重要。然而,許多基於人工智慧的框架因操作知識不足而面臨挑戰。手動編寫的知識雖有幫助,但耗時且效率低下。為應對這些挑戰,我們推出了Mobile-Agent-V框架,該框架利用視頻指導提供豐富且成本效益高的操作知識,從而增強移動自動化能力。Mobile-Agent-V通過利用視頻輸入來提升任務執行能力,無需專門的採樣或預處理。該框架整合了滑動窗口策略,並引入了視頻代理和深度反思代理,以確保操作與用戶指令保持一致。通過這一創新方法,用戶可以在指導下記錄任務流程,使系統能夠自主學習並高效執行任務。實驗結果表明,Mobile-Agent-V相比現有框架實現了30%的性能提升。
English
The rapid increase in mobile device usage necessitates improved automation
for seamless task management. However, many AI-driven frameworks struggle due
to insufficient operational knowledge. Manually written knowledge helps but is
labor-intensive and inefficient. To address these challenges, we introduce
Mobile-Agent-V, a framework that leverages video guidance to provide rich and
cost-effective operational knowledge for mobile automation. Mobile-Agent-V
enhances task execution capabilities by leveraging video inputs without
requiring specialized sampling or preprocessing. Mobile-Agent-V integrates a
sliding window strategy and incorporates a video agent and deep-reflection
agent to ensure that actions align with user instructions. Through this
innovative approach, users can record task processes with guidance, enabling
the system to autonomously learn and execute tasks efficiently. Experimental
results show that Mobile-Agent-V achieves a 30% performance improvement
compared to existing frameworks.Summary
AI-Generated Summary