ChatPaper.aiChatPaper

UI-R1:通过强化学习提升GUI代理的行为预测能力

UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning

March 27, 2025
作者: Zhengxi Lu, Yuxiang Chai, Yaxuan Guo, Xi Yin, Liang Liu, Hao Wang, Guanjing Xiong, Hongsheng Li
cs.AI

摘要

近期发布的DeepSeek-R1通过基于规则的奖励强化学习(RL),展示了大型语言模型(LLMs)在推理能力上的突破。基于这一理念,我们首次探索了如何利用基于规则的RL增强多模态大语言模型(MLLMs)在图形用户界面(GUI)动作预测任务中的推理能力。为此,我们精心构建了一个小而高质量的数据集,包含136项具有挑战性的任务,涵盖了移动设备上的五种常见动作类型。我们还引入了一种统一的基于规则的动作奖励机制,使得模型能够通过基于策略的算法(如群体相对策略优化GRPO)进行优化。实验结果表明,我们提出的数据高效模型UI-R1-3B在领域内(ID)和领域外(OOD)任务上均取得了显著提升。具体而言,在ID基准测试AndroidControl上,动作类型准确率提升了15%,而定位准确率提高了10.3%,相较于基础模型(即Qwen2.5-VL-3B)。在OOD GUI定位基准测试ScreenSpot-Pro上,我们的模型超越了基础模型6.0%,并与通过监督微调(SFT)在76K数据上训练的大型模型(如OS-Atlas-7B)表现相当。这些成果凸显了基于规则的强化学习在推进GUI理解与控制方面的潜力,为该领域的未来研究铺平了道路。
English
The recent DeepSeek-R1 has showcased the emergence of reasoning capabilities in LLMs through reinforcement learning (RL) with rule-based rewards. Building on this idea, we are the first to explore how rule-based RL can enhance the reasoning capabilities of multimodal large language models (MLLMs) for graphic user interface (GUI) action prediction tasks. To this end, we curate a small yet high-quality dataset of 136 challenging tasks, encompassing five common action types on mobile devices. We also introduce a unified rule-based action reward, enabling model optimization via policy-based algorithms such as Group Relative Policy Optimization (GRPO). Experimental results demonstrate that our proposed data-efficient model, UI-R1-3B, achieves substantial improvements on both in-domain (ID) and out-of-domain (OOD) tasks. Specifically, on the ID benchmark AndroidControl, the action type accuracy improves by 15%, while grounding accuracy increases by 10.3%, compared with the base model (i.e. Qwen2.5-VL-3B). On the OOD GUI grounding benchmark ScreenSpot-Pro, our model surpasses the base model by 6.0% and achieves competitive performance with larger models (e.g., OS-Atlas-7B), which are trained via supervised fine-tuning (SFT) on 76K data. These results underscore the potential of rule-based reinforcement learning to advance GUI understanding and control, paving the way for future research in this domain.

Summary

AI-Generated Summary

PDF619March 28, 2025