Mobile-Agent-v3.5:跨平台基礎圖形化使用者介面代理系統
Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents
February 15, 2026
作者: Haiyang Xu, Xi Zhang, Haowei Liu, Junyang Wang, Zhaozai Zhu, Shengjie Zhou, Xuhao Hu, Feiyu Gao, Junjie Cao, Zihua Wang, Zhiyuan Chen, Jitong Liao, Qi Zheng, Jiahui Zeng, Ze Xu, Shuai Bai, Junyang Lin, Jingren Zhou, Ming Yan
cs.AI
摘要
本文介紹了GUI-Owl-1.5——最新一代原生GUI智能體模型,具備指令/思維變體的多規模版本(2B/4B/8B/32B/235B),支援桌面端、移動端、瀏覽器等多平台,實現雲邊協同與即時互動。該模型在開源模型中於20餘項GUI基準測試取得突破性成果:(1)GUI自動化任務:OSWorld達56.5分,AndroidWorld達71.6分,WebArena達48.4分;(2)視覺定位任務:ScreenSpotPro獲80.3分;(3)工具調用任務:OSWorld-MCP獲47.6分,MobileWorld獲46.8分;(4)記憶與知識任務:GUI-Knowledge Bench獲75.5分。GUI-Owl-1.5融合三大核心創新:(1)混合數據飛輪:基於模擬環境與雲端沙箱環境構建UI理解與軌跡生成的數據管道,提升數據收集效率與質量;(2)智能體能力統一增強:採用統一思維合成管道強化模型推理能力,重點提升工具/MCP調用、記憶存儲及多智能體協作等關鍵能力;(3)多平台環境強化學習擴展:提出新型環境RL算法MRPO,解決多平台衝突與長週期任務訓練效率低的難題。GUI-Owl-1.5模型已開源,並提供雲沙箱在線演示:https://github.com/X-PLUG/MobileAgent。
English
The paper introduces GUI-Owl-1.5, the latest native GUI agent model that features instruct/thinking variants in multiple sizes (2B/4B/8B/32B/235B) and supports a range of platforms (desktop, mobile, browser, and more) to enable cloud-edge collaboration and real-time interaction. GUI-Owl-1.5 achieves state-of-the-art results on more than 20+ GUI benchmarks on open-source models: (1) on GUI automation tasks, it obtains 56.5 on OSWorld, 71.6 on AndroidWorld, and 48.4 on WebArena; (2) on grounding tasks, it obtains 80.3 on ScreenSpotPro; (3) on tool-calling tasks, it obtains 47.6 on OSWorld-MCP, and 46.8 on MobileWorld; (4) on memory and knowledge tasks, it obtains 75.5 on GUI-Knowledge Bench. GUI-Owl-1.5 incorporates several key innovations: (1) Hybird Data Flywheel: we construct the data pipeline for UI understanding and trajectory generation based on a combination of simulated environments and cloud-based sandbox environments, in order to improve the efficiency and quality of data collection. (2) Unified Enhancement of Agent Capabilities: we use a unified thought-synthesis pipeline to enhance the model's reasoning capabilities, while placing particular emphasis on improving key agent abilities, including Tool/MCP use, memory and multi-agent adaptation; (3) Multi-platform Environment RL Scaling: We propose a new environment RL algorithm, MRPO, to address the challenges of multi-platform conflicts and the low training efficiency of long-horizon tasks. The GUI-Owl-1.5 models are open-sourced, and an online cloud-sandbox demo is available at https://github.com/X-PLUG/MobileAgent.