SkillHarness: 컴퓨터 사용 에이전트를 위한 안전한 스킬 활용

초록

컴퓨터 사용 에이전트(Computer-Use Agents, CUAs)는 동적인 상호작용 환경에 점점 더 많이 배치되면서, 상호작용 중 지속적인 스킬 학습의 필요성이 증가하고 있다. 최근 접근법은 성공적인 궤적으로부터 재사용 가능한 스킬을 학습함으로써 이러한 과제를 해결한다. 그러나 이러한 스킬 학습 방법은 대부분 정적이고 안전한 환경을 가정하며, 적대적 상호작용(예: 프롬프트 주입) 및 환경 동적 변화(예: 팝업)로 인한 위험을 간과한다. 동적 환경에서는 이러한 가정이 위험한 스킬 학습과 취약한 실행을 초래하여 CUA의 신뢰성을 저하시킨다. 이는 다음과 같은 질문을 제기한다: CUA가 동적 환경에서 어떻게 안전하게 스킬을 학습하고 사용할 수 있는가? 이 문제를 해결하기 위해 우리는 동적 환경에서의 안전한 스킬 활용을 위한 프레임워크인 SkillHarness를 제안한다. SkillHarness는 정적 스킬 추상화에서 벗어나, 스킬 학습과 활용을 안전성 제약이 있는 상호작용 과정으로 모델링한다. 구체적으로, 우리는 다중 소스 감독 신호를 활용하여 상호작용 궤적으로부터 안전한 스킬을 식별하고, 스킬 수명 주기 전반에 걸쳐 자기 개선형 안전성 제약을 구축하는 스킬 경계(skill boundary)를 도입한다. 또한 SkillHarness는 선택적 스킬 재사용을 도입하여, 작업이 맥락에 따라 분해되고 스킬 부분 집합의 선택적 활성화를 통해 완료되도록 유도한다. 실험 결과, SkillHarness는 학습된 스킬의 안전하지 않은 비율을 57.1% 크게 감소시키고, 동적 환경 변화 하에서 실행 안정성을 일관되게 향상시켜 기존 기준선을 능가함을 보여준다.

English

Computer-Use Agents (CUAs) are increasingly deployed in dynamic interactive environments, creating a growing need for continual skill learning during interaction. Recent approaches address this challenge by learning reusable skills from successful trajectories. However, these skill learning methods largely assume static and safe environments, overlooking risks from adversarial interactions (e.g., prompt injections) and environmental dynamics (e.g., pop-ups). In dynamic settings, such assumptions can lead to risky skill learning and brittle execution, undermining the reliability of CUAs. This raises the question: how can CUAs learn and use skills safely in dynamic environments? To address this problem, we propose SkillHarness, a framework for safe skill harnessing in dynamic environments. SkillHarness moves beyond static skill abstractions by modeling skill learning and utilization as a safety-constrained interaction process. Specifically, we introduce the skill boundary that leverages multi-source supervision signals to identify safe skills from interaction trajectories, and construct self-improving safety constraints throughout the skill lifecycle. In addition, SkillHarness introduces selective skill reuse, where tasks are guided to decompose according to context and completed through the selective activation of skill subsets. Our experiments demonstrate that SkillHarness significantly reduces the unsafe rate of learned skills by 57.1% and consistently improves execution stability under dynamic environmental changes, outperforming existing baselines.