VeriGuard：透過驗證程式碼生成強化LLM代理安全性

摘要

在敏感領域（如醫療保健）中部署自主人工智慧代理，引入了對安全、保密及隱私的重大風險。這些代理可能偏離用戶目標、違反數據處理政策，或遭受敵對攻擊而受損。減輕這些危險需要一種機制，以正式保證代理的行動遵循預先設定的安全約束，這是現有系統尚未完全解決的挑戰。我們介紹了VeriGuard，這是一種新穎的框架，通過專為強健且可驗證的正確性設計的雙階段架構，為基於大型語言模型（LLM）的代理提供正式的安全保證。初始的離線階段包含一個全面的驗證過程，首先澄清用戶意圖以確立精確的安全規範。VeriGuard隨後合成行為策略，並對其進行測試與正式驗證，以證明其符合這些規範。此迭代過程不斷精煉策略，直至其被認定為正確。接著，第二階段提供線上行動監控，VeriGuard作為運行時監控器，在執行前對每個提議的代理行動進行預驗證策略的驗證。這種將詳盡的離線驗證與輕量級的線上監控分離的方式，使得正式保證能夠實際應用，提供了一種強健的保護措施，大幅提升了LLM代理的可信度。

English

The deployment of autonomous AI agents in sensitive domains, such as healthcare, introduces critical risks to safety, security, and privacy. These agents may deviate from user objectives, violate data handling policies, or be compromised by adversarial attacks. Mitigating these dangers necessitates a mechanism to formally guarantee that an agent's actions adhere to predefined safety constraints, a challenge that existing systems do not fully address. We introduce VeriGuard, a novel framework that provides formal safety guarantees for LLM-based agents through a dual-stage architecture designed for robust and verifiable correctness. The initial offline stage involves a comprehensive validation process. It begins by clarifying user intent to establish precise safety specifications. VeriGuard then synthesizes a behavioral policy and subjects it to both testing and formal verification to prove its compliance with these specifications. This iterative process refines the policy until it is deemed correct. Subsequently, the second stage provides online action monitoring, where VeriGuard operates as a runtime monitor to validate each proposed agent action against the pre-verified policy before execution. This separation of the exhaustive offline validation from the lightweight online monitoring allows formal guarantees to be practically applied, providing a robust safeguard that substantially improves the trustworthiness of LLM agents.

VeriGuard：透過驗證程式碼生成強化LLM代理安全性

VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation

摘要

Support