ANCHOR: GUI 에이전트를 위한 분기점 데이터 생성

초록

실제 데스크톱 환경을 위한 종단형 GUI 에이전트는 대량의 고품질 상호작용 데이터를 필요로 하지만, 인간 시연 데이터 수집은 비용이 많이 들고, 기존 합성 파이프라인은 제한된 작업 다양성이나 노이즈가 많고 목표에서 이탈하는 궤적 문제를 겪는 경우가 많습니다. 본 연구에서는 소수의 검증된 시드 시연으로부터 확장 가능한 데스크톱 감독 데이터를 생성하는 궤적 확장 프레임워크 Anchor를 제시합니다. 각 시드에서 출발하여 의미 있는 상태 변화에 해당하는 분기점을 식별하고, 현재 GUI 컨텍스트에 조건부로 새로운 상태 기반 작업 변형을 제안합니다. 실행 에이전트는 제안된 지시를 따라 새로운 궤적을 생성하는 반면, 검증기는 상태 인식 검사와 궤적 수준 일관성을 통해 작업 완료를 강제합니다. 감독 데이터 품질을 높이기 위해 작업 조건부 단계 수준 필터링을 적용하여 근거 없는 동작을 제거하고, 분기 이후 세그먼트의 노이즈를 제거하여 일관된 의도를 유지합니다. 표준 데스크톱 벤치마크인 OSWorld와 WindowsAgentArena에서의 실험 결과, 우리가 확장한 코퍼스로 미세 조정된 모델이 제로샷 에이전트와 대표적인 합성 베이스라인 대비 일관된 성능 향상을 보였으며, 다양한 애플리케이션과 운영체제에서 일반화 성능을 나타냈습니다.

English

End-to-end GUI agents for real desktop environments require large amounts of high-quality interaction data, yet collecting human demonstrations is expensive and existing synthetic pipelines often suffer from limited task diversity or noisy, goal-drifting trajectories. We present a trajectory expansion framework Anchor that bootstraps scalable desktop supervision from a small set of verified seed demonstrations. Starting from each seed, we identify branch points that correspond to meaningful state changes and propose new, state-grounded task variants conditioned on the current GUI context. An executing agent then follows the proposed instructions to generate new trajectories, while a verifier enforces task completion via state-aware checks and trajectory-level consistency. To improve supervision quality, we further apply task-conditioned step-level filtering to remove ungrounded actions and denoise post-branch segments to maintain coherent intent. Experiments on standard desktop benchmarks, OSWorld and WindowsAgentArena, show that models fine-tuned on our expanded corpus achieve consistent improvements over zero-shot agents and representative synthesis baselines, and generalize across applications and operating systems.

ANCHOR: GUI 에이전트를 위한 분기점 데이터 생성

ANCHOR: Branch-Point Data Generation for GUI Agents

초록

Support