HarmonyGuard: 적응형 정책 강화 및 이중 목적 최적화를 통한 웹 에이전트의 안전성과 유용성 향상

초록

대형 언어 모델은 에이전트가 개방형 웹 환경에서 자율적으로 작업을 수행할 수 있게 합니다. 그러나 웹 내의 숨겨진 위협이 진화함에 따라, 웹 에이전트는 장기간 작업 중에 작업 성능과 새롭게 발생하는 위험 사이의 균형을 맞추는 과제에 직면합니다. 이 문제는 매우 중요함에도 불구하고, 현재 연구는 단일 목표 최적화나 단일 턴 시나리오에 국한되어 있어, 웹 환경에서 안전성과 유용성 모두를 협력적으로 최적화할 수 있는 능력이 부족합니다. 이러한 격차를 해결하기 위해, 우리는 정책 강화와 목표 최적화를 통해 유용성과 안전성을 함께 개선하는 다중 에이전트 협업 프레임워크인 HarmonyGuard를 제안합니다. HarmonyGuard는 두 가지 기본 능력으로 특징지어지는 다중 에이전트 아키텍처를 갖추고 있습니다: (1) 적응형 정책 강화: HarmonyGuard 내의 정책 에이전트는 비정형 외부 문서로부터 구조화된 보안 정책을 자동으로 추출하고 유지하며, 진화하는 위협에 대응하여 정책을 지속적으로 업데이트합니다. (2) 이중 목표 최적화: 안전성과 유용성이라는 이중 목표를 기반으로, HarmonyGuard에 통합된 유용성 에이전트는 마코비안 실시간 추론을 수행하여 목표를 평가하고, 메타인지 능력을 활용하여 이를 최적화합니다. 여러 벤치마크에 대한 광범위한 평가 결과, HarmonyGuard는 기존 기준 대비 정책 준수율을 최대 38%, 작업 완료율을 최대 20% 향상시키며, 모든 작업에서 90% 이상의 정책 준수율을 달성했습니다. 우리의 프로젝트는 여기에서 확인할 수 있습니다: https://github.com/YurunChen/HarmonyGuard.

English

Large language models enable agents to autonomously perform tasks in open web environments. However, as hidden threats within the web evolve, web agents face the challenge of balancing task performance with emerging risks during long-sequence operations. Although this challenge is critical, current research remains limited to single-objective optimization or single-turn scenarios, lacking the capability for collaborative optimization of both safety and utility in web environments. To address this gap, we propose HarmonyGuard, a multi-agent collaborative framework that leverages policy enhancement and objective optimization to jointly improve both utility and safety. HarmonyGuard features a multi-agent architecture characterized by two fundamental capabilities: (1) Adaptive Policy Enhancement: We introduce the Policy Agent within HarmonyGuard, which automatically extracts and maintains structured security policies from unstructured external documents, while continuously updating policies in response to evolving threats. (2) Dual-Objective Optimization: Based on the dual objectives of safety and utility, the Utility Agent integrated within HarmonyGuard performs the Markovian real-time reasoning to evaluate the objectives and utilizes metacognitive capabilities for their optimization. Extensive evaluations on multiple benchmarks show that HarmonyGuard improves policy compliance by up to 38% and task completion by up to 20% over existing baselines, while achieving over 90% policy compliance across all tasks. Our project is available here: https://github.com/YurunChen/HarmonyGuard.

HarmonyGuard: 적응형 정책 강화 및 이중 목적 최적화를 통한 웹 에이전트의 안전성과 유용성 향상

HarmonyGuard: Toward Safety and Utility in Web Agents via Adaptive Policy Enhancement and Dual-Objective Optimization

초록

Support