通過設計擊敗提示注入攻擊
Defeating Prompt Injections by Design
March 24, 2025
作者: Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, Florian Tramèr
cs.AI
摘要
大型語言模型(LLMs)正日益被部署於與外部環境互動的代理系統中。然而,LLM代理在處理不可信數據時容易受到提示注入攻擊的威脅。本文提出CaMeL,一種強大的防禦機制,它在LLM周圍建立了一個保護性系統層,即使底層模型可能易受攻擊,也能確保其安全。CaMeL通過明確提取(可信)查詢中的控制流和數據流來運作;因此,LLM檢索到的不可信數據永遠無法影響程序流程。為了進一步提升安全性,CaMeL依賴於一種能力概念,以防止私人數據通過未經授權的數據流外洩。我們通過在AgentDojo [NeurIPS 2024]——一個最新的代理安全基準測試中,解決了67%具有可證明安全性的任務,展示了CaMeL的有效性。
English
Large Language Models (LLMs) are increasingly deployed in agentic systems
that interact with an external environment. However, LLM agents are vulnerable
to prompt injection attacks when handling untrusted data. In this paper we
propose CaMeL, a robust defense that creates a protective system layer around
the LLM, securing it even when underlying models may be susceptible to attacks.
To operate, CaMeL explicitly extracts the control and data flows from the
(trusted) query; therefore, the untrusted data retrieved by the LLM can never
impact the program flow. To further improve security, CaMeL relies on a notion
of a capability to prevent the exfiltration of private data over unauthorized
data flows. We demonstrate effectiveness of CaMeL by solving 67% of tasks
with provable security in AgentDojo [NeurIPS 2024], a recent agentic security
benchmark.Summary
AI-Generated Summary