Fara-7B:面向计算机操作的高效智能体模型
Fara-7B: An Efficient Agentic Model for Computer Use
November 24, 2025
作者: Ahmed Awadallah, Yash Lara, Raghav Magazine, Hussein Mozannar, Akshay Nambi, Yash Pandya, Aravind Rajeswaran, Corby Rosset, Alexey Taymanov, Vibhav Vineet, Spencer Whitehead, Andrew Zhao
cs.AI
摘要
计算机使用智能体(CUA)的发展长期受限于缺乏大规模高质量的人机交互数据集。尽管大语言模型在丰富文本数据上取得突破,但CUA行为轨迹领域仍缺乏可比的数据资源。为弥补这一空白,我们推出FaraGen——一个面向多步骤网页任务的新型合成数据生成系统。该系统能够从高频使用网站中提取多样化任务,生成多组解决方案尝试,并通过多重验证器筛选成功轨迹。该技术在多步骤网页任务中实现了高吞吐量、高产出率与高多样性,每条验证轨迹的生成成本约为1美元。基于此数据训练的Fara-7B成为原生CUA模型,仅通过屏幕截图感知计算机界面,通过预测坐标执行操作,且体积小巧足以在终端设备运行。实验表明,Fara-7B在WebVoyager、Online-Mind2Web及我们新开发的WebTailBench(能更好捕捉现有基准测试中代表性不足的网页任务)等基准测试中,均优于同类规模的CUA模型。更值得注意的是,该模型与体积更大的前沿模型性能相当,这彰显了可扩展数据生成系统在推进小型高效智能体模型发展中的关键价值。我们将通过Microsoft Foundry和HuggingFace平台开放Fara-7B的权重参数,并同步发布WebTailBench基准测试集。
English
Progress in computer use agents (CUAs) has been constrained by the absence of large and high-quality datasets that capture how humans interact with a computer. While LLMs have thrived on abundant textual data, no comparable corpus exists for CUA trajectories. To address these gaps, we introduce FaraGen, a novel synthetic data generation system for multi-step web tasks. FaraGen can propose diverse tasks from frequently used websites, generate multiple solution attempts, and filter successful trajectories using multiple verifiers. It achieves high throughput, yield, and diversity for multi-step web tasks, producing verified trajectories at approximately $1 each. We use this data to train Fara-7B, a native CUA model that perceives the computer using only screenshots, executes actions via predicted coordinates, and is small enough to run on-device. We find that Fara-7B outperforms other CUA models of comparable size on benchmarks like WebVoyager, Online-Mind2Web, and WebTailBench -- our novel benchmark that better captures under-represented web tasks in pre-existing benchmarks. Furthermore, Fara-7B is competitive with much larger frontier models, illustrating key benefits of scalable data generation systems in advancing small efficient agentic models. We are making Fara-7B open-weight on Microsoft Foundry and HuggingFace, and we are releasing WebTailBench.