VeriGuard: 검증된 코드 생성을 통한 LLM 에이전트 안전성 강화

초록

헬스케어와 같은 민감한 영역에서 자율적 AI 에이전트를 배치하는 것은 안전, 보안, 개인정보 보호 측면에서 중대한 위험을 초래할 수 있습니다. 이러한 에이전트는 사용자의 목표에서 벗어나거나 데이터 처리 정책을 위반하거나 적대적 공격에 의해 손상될 가능성이 있습니다. 이러한 위험을 완화하기 위해서는 에이전트의 행동이 미리 정의된 안전 제약 조건을 준수함을 공식적으로 보장할 수 있는 메커니즘이 필요하며, 이는 기존 시스템이 완전히 해결하지 못한 과제입니다. 우리는 LLM 기반 에이전트를 위해 견고하고 검증 가능한 정확성을 목표로 이중 단계 아키텍처를 통해 공식적인 안전 보장을 제공하는 새로운 프레임워크인 VeriGuard를 소개합니다. 초기 오프라인 단계에서는 포괄적인 검증 프로세스가 진행됩니다. 이는 사용자의 의도를 명확히 하여 정확한 안전 사양을 수립하는 것으로 시작합니다. VeriGuard는 이후 행동 정책을 합성하고 이를 테스트 및 공식 검증에 적용하여 해당 사양을 준수함을 증명합니다. 이 반복적인 프로세스를 통해 정책이 정확하다고 판단될 때까지 개선됩니다. 이후 두 번째 단계에서는 온라인 행동 모니터링이 제공되며, VeriGuard는 런타임 모니터로 작동하여 실행 전에 제안된 각 에이전트 행동을 사전 검증된 정책에 대해 검증합니다. 이러한 철저한 오프라인 검증과 가벼운 온라인 모니터링의 분리는 공식적인 보장을 실질적으로 적용할 수 있게 하여 LLM 에이전트의 신뢰성을 크게 향상시키는 견고한 안전 장치를 제공합니다.

English

The deployment of autonomous AI agents in sensitive domains, such as healthcare, introduces critical risks to safety, security, and privacy. These agents may deviate from user objectives, violate data handling policies, or be compromised by adversarial attacks. Mitigating these dangers necessitates a mechanism to formally guarantee that an agent's actions adhere to predefined safety constraints, a challenge that existing systems do not fully address. We introduce VeriGuard, a novel framework that provides formal safety guarantees for LLM-based agents through a dual-stage architecture designed for robust and verifiable correctness. The initial offline stage involves a comprehensive validation process. It begins by clarifying user intent to establish precise safety specifications. VeriGuard then synthesizes a behavioral policy and subjects it to both testing and formal verification to prove its compliance with these specifications. This iterative process refines the policy until it is deemed correct. Subsequently, the second stage provides online action monitoring, where VeriGuard operates as a runtime monitor to validate each proposed agent action against the pre-verified policy before execution. This separation of the exhaustive offline validation from the lightweight online monitoring allows formal guarantees to be practically applied, providing a robust safeguard that substantially improves the trustworthiness of LLM agents.

VeriGuard: 검증된 코드 생성을 통한 LLM 에이전트 안전성 강화

VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation

초록

Support