ChatPaper.aiChatPaper

Mobile-Agent-v3.5:多平台基础图形用户界面智能体

Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents

February 15, 2026
作者: Haiyang Xu, Xi Zhang, Haowei Liu, Junyang Wang, Zhaozai Zhu, Shengjie Zhou, Xuhao Hu, Feiyu Gao, Junjie Cao, Zihua Wang, Zhiyuan Chen, Jitong Liao, Qi Zheng, Jiahui Zeng, Ze Xu, Shuai Bai, Junyang Lin, Jingren Zhou, Ming Yan
cs.AI

摘要

本文介绍了最新原生GUI智能体模型GUI-Owl-1.5,该模型提供指令/思维双版本及多种规模(2B/4B/8B/32B/235B),支持桌面端、移动端、浏览器等多平台运行,实现云边协同与实时交互。在20余项开源GUI基准测试中,GUI-Owl-1.5均取得领先成果:(1)GUI自动化任务:OSWorld达56.5分,AndroidWorld达71.6分,WebArena达48.4分;(2) grounding任务:ScreenSpotPro达80.3分;(3)工具调用任务:OSWorld-MCP达47.6分,MobileWorld达46.8分;(4)记忆与知识任务:GUI-Knowledge Bench达75.5分。该模型融合三大创新:(1)混合数据飞轮:通过模拟环境与云端沙箱环境结合,构建了UI理解与轨迹生成的数据管道,提升数据采集效率与质量;(2)智能体能力统一增强:采用统一思维合成管道强化模型推理能力,重点提升工具/MCP调用、记忆存储与多智能体适配等核心能力;(3)多平台环境强化学习扩展:提出新型环境RL算法MRPO,解决多平台冲突与长周期任务训练效率低的难题。GUI-Owl-1.5模型已开源,云端沙箱演示详见https://github.com/X-PLUG/MobileAgent。
English
The paper introduces GUI-Owl-1.5, the latest native GUI agent model that features instruct/thinking variants in multiple sizes (2B/4B/8B/32B/235B) and supports a range of platforms (desktop, mobile, browser, and more) to enable cloud-edge collaboration and real-time interaction. GUI-Owl-1.5 achieves state-of-the-art results on more than 20+ GUI benchmarks on open-source models: (1) on GUI automation tasks, it obtains 56.5 on OSWorld, 71.6 on AndroidWorld, and 48.4 on WebArena; (2) on grounding tasks, it obtains 80.3 on ScreenSpotPro; (3) on tool-calling tasks, it obtains 47.6 on OSWorld-MCP, and 46.8 on MobileWorld; (4) on memory and knowledge tasks, it obtains 75.5 on GUI-Knowledge Bench. GUI-Owl-1.5 incorporates several key innovations: (1) Hybird Data Flywheel: we construct the data pipeline for UI understanding and trajectory generation based on a combination of simulated environments and cloud-based sandbox environments, in order to improve the efficiency and quality of data collection. (2) Unified Enhancement of Agent Capabilities: we use a unified thought-synthesis pipeline to enhance the model's reasoning capabilities, while placing particular emphasis on improving key agent abilities, including Tool/MCP use, memory and multi-agent adaptation; (3) Multi-platform Environment RL Scaling: We propose a new environment RL algorithm, MRPO, to address the challenges of multi-platform conflicts and the low training efficiency of long-horizon tasks. The GUI-Owl-1.5 models are open-sourced, and an online cloud-sandbox demo is available at https://github.com/X-PLUG/MobileAgent.
PDF192February 21, 2026