ChatPaper.aiChatPaper

UI-R1:通過強化學習提升GUI代理的行動預測能力

UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning

March 27, 2025
作者: Zhengxi Lu, Yuxiang Chai, Yaxuan Guo, Xi Yin, Liang Liu, Hao Wang, Guanjing Xiong, Hongsheng Li
cs.AI

摘要

近期發表的DeepSeek-R1展示了通過基於規則獎勵的強化學習(RL)在大型語言模型(LLMs)中推理能力的湧現。基於這一理念,我們首次探索了基於規則的強化學習如何提升多模態大型語言模型(MLLMs)在圖形用戶界面(GUI)動作預測任務中的推理能力。為此,我們精心構建了一個小而高質量的數據集,包含136個具有挑戰性的任務,涵蓋了移動設備上的五種常見動作類型。我們還引入了一種統一的基於規則的動作獎勵機制,使得模型能夠通過基於策略的算法(如群體相對策略優化,GRPO)進行優化。實驗結果表明,我們提出的數據高效模型UI-R1-3B在域內(ID)和域外(OOD)任務上均取得了顯著提升。具體而言,在域內基準測試AndroidControl上,動作類型準確率提升了15%,而定位準確率提高了10.3%,相較於基礎模型(即Qwen2.5-VL-3B)。在域外GUI定位基準測試ScreenSpot-Pro上,我們的模型超越了基礎模型6.0%,並與通過監督微調(SFT)在76K數據上訓練的更大模型(如OS-Atlas-7B)表現相當。這些結果凸顯了基於規則的強化學習在推進GUI理解與控制方面的潛力,為該領域的未來研究鋪平了道路。
English
The recent DeepSeek-R1 has showcased the emergence of reasoning capabilities in LLMs through reinforcement learning (RL) with rule-based rewards. Building on this idea, we are the first to explore how rule-based RL can enhance the reasoning capabilities of multimodal large language models (MLLMs) for graphic user interface (GUI) action prediction tasks. To this end, we curate a small yet high-quality dataset of 136 challenging tasks, encompassing five common action types on mobile devices. We also introduce a unified rule-based action reward, enabling model optimization via policy-based algorithms such as Group Relative Policy Optimization (GRPO). Experimental results demonstrate that our proposed data-efficient model, UI-R1-3B, achieves substantial improvements on both in-domain (ID) and out-of-domain (OOD) tasks. Specifically, on the ID benchmark AndroidControl, the action type accuracy improves by 15%, while grounding accuracy increases by 10.3%, compared with the base model (i.e. Qwen2.5-VL-3B). On the OOD GUI grounding benchmark ScreenSpot-Pro, our model surpasses the base model by 6.0% and achieves competitive performance with larger models (e.g., OS-Atlas-7B), which are trained via supervised fine-tuning (SFT) on 76K data. These results underscore the potential of rule-based reinforcement learning to advance GUI understanding and control, paving the way for future research in this domain.

Summary

AI-Generated Summary

PDF619March 28, 2025