Ferret-UI Lite：構建小型設備端GUI代理的經驗教訓

摘要

開發能夠有效與圖形用戶界面（GUI）互動的自動化代理，尤其是針對小型設備端模型，仍然是一個具有挑戰性的開放性問題。本文中，我們介紹了Ferret-UI Lite，這是一款緊湊的端到端GUI代理，能夠跨多種平台運行，包括移動設備、網頁和桌面。通過採用專為開發小型模型優化的技術，我們構建了3B參數的Ferret-UI Lite代理，其方法包括從真實與合成來源中精心策劃多樣化的GUI數據集，通過思維鏈推理和視覺工具使用來增強推理時的性能，以及利用設計的獎勵進行強化學習。Ferret-UI Lite在與其他小型GUI代理的比較中展現了競爭力。在GUI定位任務中，Ferret-UI Lite在ScreenSpot-V2、ScreenSpot-Pro和OSWorld-G基準測試上分別取得了91.6%、53.3%和61.2%的成績。而在GUI導航方面，Ferret-UI Lite在AndroidWorld和OSWorld上的成功率分別達到了28.0%和19.8%。我們分享了開發緊湊型設備端GUI代理的方法與經驗教訓。

English

Developing autonomous agents that effectively interact with Graphic User Interfaces (GUIs) remains a challenging open problem, especially for small on-device models. In this paper, we present Ferret-UI Lite, a compact, end-to-end GUI agent that operates across diverse platforms, including mobile, web, and desktop. Utilizing techniques optimized for developing small models, we build our 3B Ferret-UI Lite agent through curating a diverse GUI data mixture from real and synthetic sources, strengthening inference-time performance through chain-of-thought reasoning and visual tool-use, and reinforcement learning with designed rewards. Ferret-UI Lite achieves competitive performance with other small-scale GUI agents. In GUI grounding, Ferret-UI Lite attains scores of 91.6%, 53.3%, and 61.2% on the ScreenSpot-V2, ScreenSpot-Pro, and OSWorld-G benchmarks, respectively. For GUI navigation, Ferret-UI Lite achieves success rates of 28.0% on AndroidWorld and 19.8% on OSWorld. We share our methods and lessons learned from developing compact, on-device GUI agents.

Ferret-UI Lite：構建小型設備端GUI代理的經驗教訓

Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents

摘要

Support