CUA-스킬: 컴퓨터 활용 에이전트를 위한 기술 개발

초록

컴퓨터 사용 에이전트(CUA)는 실제 업무를 완수하기 위해 컴퓨터 시스템을 자율적으로 운영하는 것을 목표로 합니다. 그러나 기존 에이전트 시스템은 확장이 어렵고 인간의 성과에 미치지 못하는 실정입니다. 핵심적인 한계는 인간이 그래픽 사용자 인터페이스와 상호 작용하는 방식과 이러한 기술을 활용하는 방법을 포착하는 재사용 가능하고 구조화된 기술 추상화가 부족하다는 점입니다. 우리는 인간의 컴퓨터 사용 지식을 매개변수화된 실행 및 구성 그래프와 결합된 기술로 인코딩하는 컴퓨터 사용 에이전트 기술 베이스인 CUA-Skill을 소개합니다. CUA-Skill은 일반적인 Windows 애플리케이션을 아우르는 정교하게 설계된 대규모 기술 라이브러리로, 확장 가능하고 신뢰할 수 있는 에이전트 개발을 위한 실용적인 인프라 및 도구 기반을 제공합니다. 이 기술 베이스를 기반으로 우리는 동적 기술 검색, 인수 인스턴스화, 메모리 인식 오류 복구를 지원하는 종단간 컴퓨터 사용 에이전트인 CUA-Skill Agent를 구축합니다. 우리의 결과는 CUA-Skill이 까다로운 종단간 에이전트 벤치마크에서 실행 성공률과 견고성을 크게 향상시켜 향후 컴퓨터 사용 에이전트 개발을 위한 견고한 기반을 마련함을 보여줍니다. WindowsAgentArena에서 CUA-Skill Agent는 57.5%(3회 중 최고치)의 최첨단 성공률을 달성하면서 기존 및 동시대 접근법보다 훨씬 더 효율적입니다. 프로젝트 페이지는 https://microsoft.github.io/cua_skill/에서 확인할 수 있습니다.

English

Computer-Using Agents (CUAs) aim to autonomously operate computer systems to complete real-world tasks. However, existing agentic systems remain difficult to scale and lag behind human performance. A key limitation is the absence of reusable and structured skill abstractions that capture how humans interact with graphical user interfaces and how to leverage these skills. We introduce CUA-Skill, a computer-using agentic skill base that encodes human computer-use knowledge as skills coupled with parameterized execution and composition graphs. CUA-Skill is a large-scale library of carefully engineered skills spanning common Windows applications, serving as a practical infrastructure and tool substrate for scalable, reliable agent development. Built upon this skill base, we construct CUA-Skill Agent, an end-to-end computer-using agent that supports dynamic skill retrieval, argument instantiation, and memory-aware failure recovery. Our results demonstrate that CUA-Skill substantially improves execution success rates and robustness on challenging end-to-end agent benchmarks, establishing a strong foundation for future computer-using agent development. On WindowsAgentArena, CUA-Skill Agent achieves state-of-the-art 57.5% (best of three) successful rate while being significantly more efficient than prior and concurrent approaches. The project page is available at https://microsoft.github.io/cua_skill/.

CUA-스킬: 컴퓨터 활용 에이전트를 위한 기술 개발

CUA-Skill: Develop Skills for Computer Using Agent

초록

Support