ANCHOR：面向GUI智能体的分支点数据生成

摘要

针对真实桌面环境的端到端图形用户界面智能体需要大量高质量交互数据，但人工演示采集成本高昂，现有合成流程常受限于任务多样性不足或存在目标漂移的噪声轨迹。我们提出轨迹扩展框架Anchor，通过少量已验证种子演示实现可扩展的桌面监督数据生成。该框架从每个种子轨迹出发，识别引发显著状态变化的分支点，并基于当前图形界面上下文生成新的状态锚定任务变体。执行智能体随后遵循指令生成新轨迹，验证器则通过状态感知检查和轨迹级一致性确保任务完成。为提升监督质量，我们进一步应用任务条件化的步骤级过滤以消除无关联操作，并对分支后片段进行去噪处理以保持意图连贯性。在OSWorld和WindowsAgentArena标准桌面基准测试中，基于扩展语料库微调的模型相比零样本智能体和代表性合成基线实现持续提升，并展现出跨应用与操作系统的泛化能力。

English

End-to-end GUI agents for real desktop environments require large amounts of high-quality interaction data, yet collecting human demonstrations is expensive and existing synthetic pipelines often suffer from limited task diversity or noisy, goal-drifting trajectories. We present a trajectory expansion framework Anchor that bootstraps scalable desktop supervision from a small set of verified seed demonstrations. Starting from each seed, we identify branch points that correspond to meaningful state changes and propose new, state-grounded task variants conditioned on the current GUI context. An executing agent then follows the proposed instructions to generate new trajectories, while a verifier enforces task completion via state-aware checks and trajectory-level consistency. To improve supervision quality, we further apply task-conditioned step-level filtering to remove ungrounded actions and denoise post-branch segments to maintain coherent intent. Experiments on standard desktop benchmarks, OSWorld and WindowsAgentArena, show that models fine-tuned on our expanded corpus achieve consistent improvements over zero-shot agents and representative synthesis baselines, and generalize across applications and operating systems.

ANCHOR：面向GUI智能体的分支点数据生成

ANCHOR: Branch-Point Data Generation for GUI Agents

摘要

Support