VeriGuard:通过验证代码生成提升LLM代理的安全性
VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation
October 3, 2025
作者: Lesly Miculicich, Mihir Parmar, Hamid Palangi, Krishnamurthy Dj Dvijotham, Mirko Montanari, Tomas Pfister, Long T. Le
cs.AI
摘要
在医疗等敏感领域部署自主AI代理,带来了安全、安保和隐私方面的重大风险。这些代理可能偏离用户目标、违反数据处理政策,或遭受敌对攻击而受损。要缓解这些危险,需要一种机制来正式保证代理的行为符合预设的安全约束,这是现有系统尚未完全解决的挑战。我们提出了VeriGuard,一种新颖的框架,通过设计用于稳健且可验证正确性的双阶段架构,为基于LLM的代理提供正式的安全保障。初始的离线阶段包含一个全面的验证过程,首先明确用户意图以确立精确的安全规范。VeriGuard随后合成行为策略,并通过测试和形式验证双重手段证明其符合这些规范。这一迭代过程不断优化策略直至其被确认为正确。紧接着,第二阶段提供在线动作监控,VeriGuard作为运行时监控器,在执行前验证每个提议的代理动作是否符合预先验证的策略。这种将详尽的离线验证与轻量级的在线监控分离的做法,使得正式保证能够实际应用,为LLM代理的信任度提供了强有力的保障。
English
The deployment of autonomous AI agents in sensitive domains, such as
healthcare, introduces critical risks to safety, security, and privacy. These
agents may deviate from user objectives, violate data handling policies, or be
compromised by adversarial attacks. Mitigating these dangers necessitates a
mechanism to formally guarantee that an agent's actions adhere to predefined
safety constraints, a challenge that existing systems do not fully address. We
introduce VeriGuard, a novel framework that provides formal safety guarantees
for LLM-based agents through a dual-stage architecture designed for robust and
verifiable correctness. The initial offline stage involves a comprehensive
validation process. It begins by clarifying user intent to establish precise
safety specifications. VeriGuard then synthesizes a behavioral policy and
subjects it to both testing and formal verification to prove its compliance
with these specifications. This iterative process refines the policy until it
is deemed correct. Subsequently, the second stage provides online action
monitoring, where VeriGuard operates as a runtime monitor to validate each
proposed agent action against the pre-verified policy before execution. This
separation of the exhaustive offline validation from the lightweight online
monitoring allows formal guarantees to be practically applied, providing a
robust safeguard that substantially improves the trustworthiness of LLM agents.