ANCHOR:面向GUI智能体的分支点数据生成
ANCHOR: Branch-Point Data Generation for GUI Agents
February 6, 2026
作者: Jinbiao Wei, Yilun Zhao, Kangqi Ni, Arman Cohan
cs.AI
摘要
针对真实桌面环境的端到端图形用户界面智能体需要大量高质量交互数据,但人工演示数据采集成本高昂,现有合成流程常受限于任务多样性不足或存在目标偏移的嘈杂轨迹。我们提出轨迹扩展框架Anchor,通过少量已验证种子演示实现桌面交互数据的可扩展自举生成。该框架从每个种子轨迹出发,识别引发界面状态关键变化的分支点,并基于当前图形界面上下文生成状态锚定的新任务变体。执行智能体随后遵循指令生成新轨迹,验证器则通过状态感知检查和轨迹级一致性确保任务完成度。为提升监督信号质量,我们进一步采用任务条件化的步骤级过滤机制剔除无锚定动作,并对分支后轨迹段进行降噪处理以保持意图连贯性。在OSWorld和WindowsAgentArena标准桌面基准测试中,基于本框架扩展数据微调的模型相比零样本智能体和代表性合成基线均取得稳定提升,且能跨应用程序和操作系统实现泛化。
English
End-to-end GUI agents for real desktop environments require large amounts of high-quality interaction data, yet collecting human demonstrations is expensive and existing synthetic pipelines often suffer from limited task diversity or noisy, goal-drifting trajectories. We present a trajectory expansion framework Anchor that bootstraps scalable desktop supervision from a small set of verified seed demonstrations. Starting from each seed, we identify branch points that correspond to meaningful state changes and propose new, state-grounded task variants conditioned on the current GUI context. An executing agent then follows the proposed instructions to generate new trajectories, while a verifier enforces task completion via state-aware checks and trajectory-level consistency. To improve supervision quality, we further apply task-conditioned step-level filtering to remove ungrounded actions and denoise post-branch segments to maintain coherent intent. Experiments on standard desktop benchmarks, OSWorld and WindowsAgentArena, show that models fine-tuned on our expanded corpus achieve consistent improvements over zero-shot agents and representative synthesis baselines, and generalize across applications and operating systems.