Ferret-UI Lite:构建小型设备端GUI代理的经验启示
Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents
September 30, 2025
作者: Zhen Yang, Zi-Yi Dou, Di Feng, Forrest Huang, Anh Nguyen, Keen You, Omar Attia, Yuhao Yang, Michael Feng, Haotian Zhang, Ram Ramrakhya, Chao Jia, Jeffrey Nichols, Alexander Toshev, Yinfei Yang, Zhe Gan
cs.AI
摘要
开发能够有效与图形用户界面(GUI)交互的自主代理仍是一个具有挑战性的开放性问题,尤其对于小型设备端模型而言。本文中,我们介绍了Ferret-UI Lite,一个紧凑的端到端GUI代理,能够在包括移动、网页和桌面在内的多种平台上运行。通过采用针对小型模型优化的技术,我们构建了3B参数的Ferret-UI Lite代理,方法包括:从真实与合成来源中精选多样化的GUI数据混合,通过思维链推理和视觉工具使用增强推理时性能,以及利用设计奖励进行强化学习。Ferret-UI Lite在与其他小型GUI代理的竞争中展现了不俗的性能。在GUI定位任务中,Ferret-UI Lite在ScreenSpot-V2、ScreenSpot-Pro和OSWorld-G基准测试中分别取得了91.6%、53.3%和61.2%的得分。在GUI导航方面,Ferret-UI Lite在AndroidWorld和OSWorld上的成功率分别达到了28.0%和19.8%。我们分享了开发紧凑型设备端GUI代理的方法与经验教训。
English
Developing autonomous agents that effectively interact with Graphic User
Interfaces (GUIs) remains a challenging open problem, especially for small
on-device models. In this paper, we present Ferret-UI Lite, a compact,
end-to-end GUI agent that operates across diverse platforms, including mobile,
web, and desktop. Utilizing techniques optimized for developing small models,
we build our 3B Ferret-UI Lite agent through curating a diverse GUI data
mixture from real and synthetic sources, strengthening inference-time
performance through chain-of-thought reasoning and visual tool-use, and
reinforcement learning with designed rewards. Ferret-UI Lite achieves
competitive performance with other small-scale GUI agents. In GUI grounding,
Ferret-UI Lite attains scores of 91.6%, 53.3%, and 61.2% on the
ScreenSpot-V2, ScreenSpot-Pro, and OSWorld-G benchmarks, respectively. For GUI
navigation, Ferret-UI Lite achieves success rates of 28.0% on AndroidWorld
and 19.8% on OSWorld. We share our methods and lessons learned from
developing compact, on-device GUI agents.