ChatPaper.aiChatPaper

Surfer-H与Holo1相遇:基于开源权重的经济高效网络代理

Surfer-H Meets Holo1: Cost-Efficient Web Agent Powered by Open Weights

June 3, 2025
作者: Mathieu Andreux, Breno Baldas Skuk, Hamza Benchekroun, Emilien Biré, Antoine Bonnet, Riaz Bordie, Matthias Brunel, Pierre-Louis Cedoz, Antoine Chassang, Mickaël Chen, Alexandra D. Constantinou, Antoine d'Andigné, Hubert de La Jonquière, Aurélien Delfosse, Ludovic Denoyer, Alexis Deprez, Augustin Derupti, Michael Eickenberg, Mathïs Federico, Charles Kantor, Xavier Koegler, Yann Labbé, Matthew C. H. Lee, Erwan Le Jumeau de Kergaradec, Amir Mahla, Avshalom Manevich, Adrien Maret, Charles Masson, Rafaël Maurin, Arturo Mena, Philippe Modard, Axel Moyal, Axel Nguyen Kerbel, Julien Revelle, Mats L. Richter, María Santos, Laurent Sifre, Maxime Theillard, Marc Thibault, Louis Thiry, Léo Tronchon, Nicolas Usunier, Tony Wu
cs.AI

摘要

我们推出Surfer-H,这是一款经济高效的网络代理,它集成了视觉-语言模型(VLM)以执行用户定义的网页任务。我们将其与Holo1配对,后者是一个专精于网页导航与信息提取的新开放权重VLM集合。Holo1基于精心筛选的数据源训练而成,包括开放获取的网页内容、合成示例及自产代理数据。Holo1在通用用户界面(UI)基准测试及我们新推出的网页UI定位基准WebClick上均表现卓越。搭载Holo1的Surfer-H在WebVoyager上实现了92.2%的顶尖性能,在准确性与成本效益间达到了帕累托最优平衡。为加速代理系统研究进展,我们开源了WebClick评估数据集及Holo1模型权重。
English
We present Surfer-H, a cost-efficient web agent that integrates Vision-Language Models (VLM) to perform user-defined tasks on the web. We pair it with Holo1, a new open-weight collection of VLMs specialized in web navigation and information extraction. Holo1 was trained on carefully curated data sources, including open-access web content, synthetic examples, and self-produced agentic data. Holo1 tops generalist User Interface (UI) benchmarks as well as our new web UI localization benchmark, WebClick. When powered by Holo1, Surfer-H achieves a 92.2% state-of-the-art performance on WebVoyager, striking a Pareto-optimal balance between accuracy and cost-efficiency. To accelerate research advancement in agentic systems, we are open-sourcing both our WebClick evaluation dataset and the Holo1 model weights.
PDF272June 6, 2025