SafeRAG：在大型語言模型的檢索增強生成中進行安全性基準測試

摘要

檢索增強生成（RAG）的索引-檢索-生成範式在解決知識密集任務方面取得了巨大成功，通過將外部知識整合到大型語言模型（LLMs）中。然而，外部和未經驗證知識的整合增加了LLMs的脆弱性，因為攻擊者可以通過操縱知識執行攻擊任務。本文介紹了一個名為SafeRAG的基準，旨在評估RAG的安全性。首先，我們將攻擊任務分為銀噪聲、跨上下文衝突、軟廣告和白色拒絕服務。接下來，我們為每個任務主要手動構建RAG安全性評估數據集（即SafeRAG數據集）。然後，我們利用SafeRAG數據集來模擬RAG可能遇到的各種攻擊情景。對14個代表性RAG組件進行的實驗表明，RAG對所有攻擊任務都表現出顯著的脆弱性，即使是最明顯的攻擊任務也可以輕鬆繞過現有的檢索器、過濾器或先進的LLMs，導致RAG服務質量下降。代碼可在以下鏈接找到：https://github.com/IAAR-Shanghai/SafeRAG。

English

The indexing-retrieval-generation paradigm of retrieval-augmented generation (RAG) has been highly successful in solving knowledge-intensive tasks by integrating external knowledge into large language models (LLMs). However, the incorporation of external and unverified knowledge increases the vulnerability of LLMs because attackers can perform attack tasks by manipulating knowledge. In this paper, we introduce a benchmark named SafeRAG designed to evaluate the RAG security. First, we classify attack tasks into silver noise, inter-context conflict, soft ad, and white Denial-of-Service. Next, we construct RAG security evaluation dataset (i.e., SafeRAG dataset) primarily manually for each task. We then utilize the SafeRAG dataset to simulate various attack scenarios that RAG may encounter. Experiments conducted on 14 representative RAG components demonstrate that RAG exhibits significant vulnerability to all attack tasks and even the most apparent attack task can easily bypass existing retrievers, filters, or advanced LLMs, resulting in the degradation of RAG service quality. Code is available at: https://github.com/IAAR-Shanghai/SafeRAG.

SafeRAG：在大型語言模型的檢索增強生成中進行安全性基準測試

SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model

摘要

Support