HarmonyGuard: 適応的ポリシー強化と二重目的最適化によるWebエージェントの安全性と有用性の向上に向けて

要旨

大規模言語モデルにより、エージェントはオープンなウェブ環境で自律的にタスクを実行できるようになりました。しかし、ウェブ内の潜在的な脅威が進化するにつれ、ウェブエージェントは長期的な操作においてタスクのパフォーマンスと新たなリスクのバランスを取るという課題に直面しています。この課題は重要であるにもかかわらず、現在の研究は単一目的の最適化や単一ターンのシナリオに限定されており、ウェブ環境における安全性と有用性の両方を協調的に最適化する能力が欠けています。このギャップを埋めるため、我々はHarmonyGuardを提案します。これは、ポリシー強化と目的最適化を活用して有用性と安全性の両方を向上させるマルチエージェント協調フレームワークです。HarmonyGuardは、以下の2つの基本的な能力を特徴とするマルチエージェントアーキテクチャを備えています：(1) 適応的ポリシー強化：HarmonyGuard内のポリシーエージェントは、非構造化された外部ドキュメントから構造化されたセキュリティポリシーを自動的に抽出・維持し、進化する脅威に対応してポリシーを継続的に更新します。(2) 二重目的最適化：安全性と有用性という二重の目的に基づき、HarmonyGuardに統合されたユーティリティエージェントは、マルコフリアルタイム推論を行って目的を評価し、メタ認知能力を活用してそれらを最適化します。複数のベンチマークでの広範な評価により、HarmonyGuardは既存のベースラインと比較してポリシー遵守率を最大38%、タスク完了率を最大20%向上させ、全てのタスクにおいて90%以上のポリシー遵守率を達成することが示されました。我々のプロジェクトはこちらで公開されています：https://github.com/YurunChen/HarmonyGuard。

English

Large language models enable agents to autonomously perform tasks in open web environments. However, as hidden threats within the web evolve, web agents face the challenge of balancing task performance with emerging risks during long-sequence operations. Although this challenge is critical, current research remains limited to single-objective optimization or single-turn scenarios, lacking the capability for collaborative optimization of both safety and utility in web environments. To address this gap, we propose HarmonyGuard, a multi-agent collaborative framework that leverages policy enhancement and objective optimization to jointly improve both utility and safety. HarmonyGuard features a multi-agent architecture characterized by two fundamental capabilities: (1) Adaptive Policy Enhancement: We introduce the Policy Agent within HarmonyGuard, which automatically extracts and maintains structured security policies from unstructured external documents, while continuously updating policies in response to evolving threats. (2) Dual-Objective Optimization: Based on the dual objectives of safety and utility, the Utility Agent integrated within HarmonyGuard performs the Markovian real-time reasoning to evaluate the objectives and utilizes metacognitive capabilities for their optimization. Extensive evaluations on multiple benchmarks show that HarmonyGuard improves policy compliance by up to 38% and task completion by up to 20% over existing baselines, while achieving over 90% policy compliance across all tasks. Our project is available here: https://github.com/YurunChen/HarmonyGuard.

HarmonyGuard: 適応的ポリシー強化と二重目的最適化によるWebエージェントの安全性と有用性の向上に向けて

HarmonyGuard: Toward Safety and Utility in Web Agents via Adaptive Policy Enhancement and Dual-Objective Optimization

要旨

Support