약점으로부터 학습하기: 소형 컴퓨터 사용 에이전트를 위한 자동화된 도메인 특화

초록

컴퓨터 사용 에이전트(CUA)는 최근 상당한 진전을 이루었지만, 각 소프트웨어 도메인마다 별도의 대규모 전문가를 배치하는 것은 여전히 비용이 많이 든다. 소규모 공개 컴퓨터 사용 에이전트는 더 실용적인 특화 대상이지만, 여전히 상당히 취약하며 도메인별로 불균일한 실패를 보인다. 간단한 해결책은 대상 도메인에 대한 대규모 훈련 데이터를 합성하는 것이지만, 이러한 단순한 접근 방식은 미미한 개선만을 가져온다는 것을 발견했다. 이러한 관찰을 바탕으로, 우리는 더 강력한 참조 에이전트를 사용하여 대상 도메인에서 학생의 약점을 식별하고, 맞춤형 작업을 합성하며, 자동으로 감독을 구축하는, 소규모 컴퓨터 사용 에이전트를 위한 주석 없는 특화 프레임워크인 LearnWeak를 소개한다. LearnWeak는 또한 계획 오류와 실행 오류를 분리하는 오류 인식 특화 목적 함수를 도입하여, 광범위한 균일 감독보다 더 행동적으로 정밀한 업데이트를 가능하게 한다. OSWorld에서 LearnWeak는 여덟 도메인에 걸쳐 EvoCUA-8B 및 OpenCUA-7B 대비 각각 평균 11.6%포인트 및 11.1%포인트의 향상을 달성했다. 또한 우리의 학생 인식 데이터셋 생성 및 훈련 접근 방식이 기존의 자율 궤적 생성 및 훈련 기준선을 능가한다는 것을 검증한다. 본 연구는 데이터 합성 및 에이전트 훈련 모두에서 학생 인식의 중요성을 강조하며, 다양한 도메인에서 소규모 컴퓨터 사용 에이전트를 특화하기 위한 보다 원칙적이고 효율적인 경로를 제시한다.

English

Computer-use agents (CUAs) have recently made substantial progress, but deploying a separate large expert for each software domain remains expensive. Small open computer-use agents are more practical specialization targets, but they remain substantially weaker and exhibit uneven domain-specific failures. A straightforward remedy is to synthesize large-scale training data for the target domain, yet we find that this naive approach yields only marginal improvements. Building on this observation, we introduce LearnWeak, an annotation-free specialization framework for small computer-use agents that uses a stronger reference agent to identify the student's weaknesses in the target domain, synthesize targeted tasks, and construct supervision automatically. LearnWeak further introduces an error-aware specialization objective that disentangles planning and execution errors, enabling more behaviorally precise updates than broad uniform supervision. On OSWorld, LearnWeak achieves average gains of 11.6 and 11.1 percentage points over EvoCUA-8B and OpenCUA-7B, respectively, across eight domains. We also validate that our student-aware dataset generation and training approaches outperform existing autonomous trajectory generation and training baselines. Our work highlights the importance of student awareness in both data synthesis and agent training, pointing toward a more principled and efficient path for specializing small computer-use agents in diverse domains.