HarmonyGuard：通過自適應策略增強與雙目標優化實現網路代理的安全與效用

摘要

大型語言模型使代理能夠在開放網絡環境中自主執行任務。然而，隨著網絡中潛在威脅的演變，網絡代理在長時間序列操作中面臨著平衡任務性能與新興風險的挑戰。儘管這一挑戰至關重要，但當前的研究仍局限於單目標優化或單輪場景，缺乏在網絡環境中對安全性和實用性進行協同優化的能力。為解決這一問題，我們提出了HarmonyGuard，這是一個多代理協作框架，利用策略增強和目標優化來共同提升實用性和安全性。HarmonyGuard具有多代理架構，具備兩項基本能力：(1) 自適應策略增強：我們在HarmonyGuard中引入了策略代理，該代理能夠從非結構化外部文檔中自動提取並維護結構化的安全策略，同時根據不斷變化的威脅持續更新策略。(2) 雙目標優化：基於安全性和實用性的雙重目標，集成在HarmonyGuard中的實用性代理執行馬可夫實時推理以評估目標，並利用元認知能力對其進行優化。在多個基準上的廣泛評估表明，HarmonyGuard在策略合規性上比現有基線提高了最多38%，在任務完成率上提高了最多20%，同時在所有任務中實現了超過90%的策略合規性。我們的項目可在以下網址獲取：https://github.com/YurunChen/HarmonyGuard。

English

Large language models enable agents to autonomously perform tasks in open web environments. However, as hidden threats within the web evolve, web agents face the challenge of balancing task performance with emerging risks during long-sequence operations. Although this challenge is critical, current research remains limited to single-objective optimization or single-turn scenarios, lacking the capability for collaborative optimization of both safety and utility in web environments. To address this gap, we propose HarmonyGuard, a multi-agent collaborative framework that leverages policy enhancement and objective optimization to jointly improve both utility and safety. HarmonyGuard features a multi-agent architecture characterized by two fundamental capabilities: (1) Adaptive Policy Enhancement: We introduce the Policy Agent within HarmonyGuard, which automatically extracts and maintains structured security policies from unstructured external documents, while continuously updating policies in response to evolving threats. (2) Dual-Objective Optimization: Based on the dual objectives of safety and utility, the Utility Agent integrated within HarmonyGuard performs the Markovian real-time reasoning to evaluate the objectives and utilizes metacognitive capabilities for their optimization. Extensive evaluations on multiple benchmarks show that HarmonyGuard improves policy compliance by up to 38% and task completion by up to 20% over existing baselines, while achieving over 90% policy compliance across all tasks. Our project is available here: https://github.com/YurunChen/HarmonyGuard.

HarmonyGuard：通過自適應策略增強與雙目標優化實現網路代理的安全與效用

HarmonyGuard: Toward Safety and Utility in Web Agents via Adaptive Policy Enhancement and Dual-Objective Optimization

摘要

Support