Ferret-UI Lite：构建小型设备端GUI代理的经验启示

摘要

开发能够有效与图形用户界面（GUI）交互的自主代理仍是一个具有挑战性的开放性问题，尤其对于小型设备端模型而言。本文中，我们介绍了Ferret-UI Lite，一个紧凑的端到端GUI代理，能够在包括移动、网页和桌面在内的多种平台上运行。通过采用针对小型模型优化的技术，我们构建了3B参数的Ferret-UI Lite代理，方法包括：从真实与合成来源中精选多样化的GUI数据混合，通过思维链推理和视觉工具使用增强推理时性能，以及利用设计奖励进行强化学习。Ferret-UI Lite在与其他小型GUI代理的竞争中展现了不俗的性能。在GUI定位任务中，Ferret-UI Lite在ScreenSpot-V2、ScreenSpot-Pro和OSWorld-G基准测试中分别取得了91.6%、53.3%和61.2%的得分。在GUI导航方面，Ferret-UI Lite在AndroidWorld和OSWorld上的成功率分别达到了28.0%和19.8%。我们分享了开发紧凑型设备端GUI代理的方法与经验教训。

English

Developing autonomous agents that effectively interact with Graphic User Interfaces (GUIs) remains a challenging open problem, especially for small on-device models. In this paper, we present Ferret-UI Lite, a compact, end-to-end GUI agent that operates across diverse platforms, including mobile, web, and desktop. Utilizing techniques optimized for developing small models, we build our 3B Ferret-UI Lite agent through curating a diverse GUI data mixture from real and synthetic sources, strengthening inference-time performance through chain-of-thought reasoning and visual tool-use, and reinforcement learning with designed rewards. Ferret-UI Lite achieves competitive performance with other small-scale GUI agents. In GUI grounding, Ferret-UI Lite attains scores of 91.6%, 53.3%, and 61.2% on the ScreenSpot-V2, ScreenSpot-Pro, and OSWorld-G benchmarks, respectively. For GUI navigation, Ferret-UI Lite achieves success rates of 28.0% on AndroidWorld and 19.8% on OSWorld. We share our methods and lessons learned from developing compact, on-device GUI agents.

Ferret-UI Lite：构建小型设备端GUI代理的经验启示

Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents

摘要

Support