SkillHarness：コンピュータ利用エージェントのための安全なスキルの活用

要旨

コンピュータ操作エージェント（CUA）は、動的な対話環境でますます展開されるようになっており、対話中の継続的なスキル学習の必要性が高まっている。最近のアプローチでは、成功した軌跡から再利用可能なスキルを学習することでこの課題に対処している。しかし、これらのスキル学習手法は主に静的で安全な環境を前提としており、敵対的な相互作用（例：プロンプトインジェクション）や環境の動的変化（例：ポップアップ）によるリスクを見過ごしている。動的環境では、このような前提はリスクの高いスキル学習や脆弱な実行につながり、CUAの信頼性を損なう可能性がある。これにより、次の疑問が生じる：CUAは動的環境においてどのように安全にスキルを学習し、活用できるのか？この問題に取り組むために、我々はSkillHarnessを提案する。これは動的環境における安全なスキル活用のためのフレームワークである。SkillHarnessは、静的なスキル抽象化を超え、スキル学習と利用を安全制約付きの相互作用プロセスとしてモデル化する。具体的には、スキル境界を導入し、マルチソースの教師信号を活用して相互作用軌跡から安全なスキルを識別し、スキルライフサイクル全体を通じて自己改善型の安全制約を構築する。さらに、SkillHarnessは選択的スキル再利用を導入し、タスクをコンテキストに応じて分解し、スキルサブセットの選択的活性化を通じて完了するように導く。我々の実験は、SkillHarnessが学習されたスキルの不安全率を57.1%大幅に削減し、動的環境変化下での実行安定性を一貫して向上させ、既存のベースラインを上回ることを示している。

English

Computer-Use Agents (CUAs) are increasingly deployed in dynamic interactive environments, creating a growing need for continual skill learning during interaction. Recent approaches address this challenge by learning reusable skills from successful trajectories. However, these skill learning methods largely assume static and safe environments, overlooking risks from adversarial interactions (e.g., prompt injections) and environmental dynamics (e.g., pop-ups). In dynamic settings, such assumptions can lead to risky skill learning and brittle execution, undermining the reliability of CUAs. This raises the question: how can CUAs learn and use skills safely in dynamic environments? To address this problem, we propose SkillHarness, a framework for safe skill harnessing in dynamic environments. SkillHarness moves beyond static skill abstractions by modeling skill learning and utilization as a safety-constrained interaction process. Specifically, we introduce the skill boundary that leverages multi-source supervision signals to identify safe skills from interaction trajectories, and construct self-improving safety constraints throughout the skill lifecycle. In addition, SkillHarness introduces selective skill reuse, where tasks are guided to decompose according to context and completed through the selective activation of skill subsets. Our experiments demonstrate that SkillHarness significantly reduces the unsafe rate of learned skills by 57.1% and consistently improves execution stability under dynamic environmental changes, outperforming existing baselines.