Ferret-UI Lite:構建小型設備端GUI代理的經驗教訓
Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents
September 30, 2025
作者: Zhen Yang, Zi-Yi Dou, Di Feng, Forrest Huang, Anh Nguyen, Keen You, Omar Attia, Yuhao Yang, Michael Feng, Haotian Zhang, Ram Ramrakhya, Chao Jia, Jeffrey Nichols, Alexander Toshev, Yinfei Yang, Zhe Gan
cs.AI
摘要
開發能夠有效與圖形用戶界面(GUI)互動的自動化代理,尤其是針對小型設備端模型,仍然是一個具有挑戰性的開放性問題。本文中,我們介紹了Ferret-UI Lite,這是一款緊湊的端到端GUI代理,能夠跨多種平台運行,包括移動設備、網頁和桌面。通過採用專為開發小型模型優化的技術,我們構建了3B參數的Ferret-UI Lite代理,其方法包括從真實與合成來源中精心策劃多樣化的GUI數據集,通過思維鏈推理和視覺工具使用來增強推理時的性能,以及利用設計的獎勵進行強化學習。Ferret-UI Lite在與其他小型GUI代理的比較中展現了競爭力。在GUI定位任務中,Ferret-UI Lite在ScreenSpot-V2、ScreenSpot-Pro和OSWorld-G基準測試上分別取得了91.6%、53.3%和61.2%的成績。而在GUI導航方面,Ferret-UI Lite在AndroidWorld和OSWorld上的成功率分別達到了28.0%和19.8%。我們分享了開發緊湊型設備端GUI代理的方法與經驗教訓。
English
Developing autonomous agents that effectively interact with Graphic User
Interfaces (GUIs) remains a challenging open problem, especially for small
on-device models. In this paper, we present Ferret-UI Lite, a compact,
end-to-end GUI agent that operates across diverse platforms, including mobile,
web, and desktop. Utilizing techniques optimized for developing small models,
we build our 3B Ferret-UI Lite agent through curating a diverse GUI data
mixture from real and synthetic sources, strengthening inference-time
performance through chain-of-thought reasoning and visual tool-use, and
reinforcement learning with designed rewards. Ferret-UI Lite achieves
competitive performance with other small-scale GUI agents. In GUI grounding,
Ferret-UI Lite attains scores of 91.6%, 53.3%, and 61.2% on the
ScreenSpot-V2, ScreenSpot-Pro, and OSWorld-G benchmarks, respectively. For GUI
navigation, Ferret-UI Lite achieves success rates of 28.0% on AndroidWorld
and 19.8% on OSWorld. We share our methods and lessons learned from
developing compact, on-device GUI agents.