謎語：潛在的成員推斷與檢索增強生成

摘要

檢索增強生成（RAG）使大型語言模型（LLMs）能夠利用外部知識庫生成具有基礎的回應，而無需修改模型參數。雖然欠缺權重調整可防止通過模型參數進行信息洩露，但這也引入了推理對手利用模型上下文中檢索文檔的風險。現有的成員推理和數據提取方法通常依賴越獄或精心製作的不自然查詢，這些方法很容易被檢測或通過檢索增強生成系統中常見的查詢重寫技術挫敗。在這項工作中，我們提出了一種名為Interrogation Attack（IA）的成員推理技術，針對檢索增強生成數據存儲庫中的文檔。通過製作僅通過目標文檔存在才能回答的自然文本查詢，我們的方法展示了成功的推理，僅需30個查詢，同時保持隱蔽性；與現有方法生成的對抗提示相比，直接檢測器從我們的攻擊中生成的頻率高出約76倍。我們觀察到，在各種不同的檢索增強生成配置中，相對於先前的推理攻擊，TPR@1%FPR提高了2倍，同時每個文檔推理的成本不到0.02美元。

English

Retrieval-Augmented Generation (RAG) enables Large Language Models (LLMs) to generate grounded responses by leveraging external knowledge databases without altering model parameters. Although the absence of weight tuning prevents leakage via model parameters, it introduces the risk of inference adversaries exploiting retrieved documents in the model's context. Existing methods for membership inference and data extraction often rely on jailbreaking or carefully crafted unnatural queries, which can be easily detected or thwarted with query rewriting techniques common in RAG systems. In this work, we present Interrogation Attack (IA), a membership inference technique targeting documents in the RAG datastore. By crafting natural-text queries that are answerable only with the target document's presence, our approach demonstrates successful inference with just 30 queries while remaining stealthy; straightforward detectors identify adversarial prompts from existing methods up to ~76x more frequently than those generated by our attack. We observe a 2x improvement in TPR@1%FPR over prior inference attacks across diverse RAG configurations, all while costing less than $0.02 per document inference.

謎語：潛在的成員推斷與檢索增強生成

Riddle Me This! Stealthy Membership Inference for Retrieval-Augmented Generation

摘要

Support