謎語:潛在的成員推斷與檢索增強生成
Riddle Me This! Stealthy Membership Inference for Retrieval-Augmented Generation
February 1, 2025
作者: Ali Naseh, Yuefeng Peng, Anshuman Suri, Harsh Chaudhari, Alina Oprea, Amir Houmansadr
cs.AI
摘要
檢索增強生成(RAG)使大型語言模型(LLMs)能夠利用外部知識庫生成具有基礎的回應,而無需修改模型參數。雖然欠缺權重調整可防止通過模型參數進行信息洩露,但這也引入了推理對手利用模型上下文中檢索文檔的風險。現有的成員推理和數據提取方法通常依賴越獄或精心製作的不自然查詢,這些方法很容易被檢測或通過檢索增強生成系統中常見的查詢重寫技術挫敗。在這項工作中,我們提出了一種名為Interrogation Attack(IA)的成員推理技術,針對檢索增強生成數據存儲庫中的文檔。通過製作僅通過目標文檔存在才能回答的自然文本查詢,我們的方法展示了成功的推理,僅需30個查詢,同時保持隱蔽性;與現有方法生成的對抗提示相比,直接檢測器從我們的攻擊中生成的頻率高出約76倍。我們觀察到,在各種不同的檢索增強生成配置中,相對於先前的推理攻擊,TPR@1%FPR提高了2倍,同時每個文檔推理的成本不到0.02美元。
English
Retrieval-Augmented Generation (RAG) enables Large Language Models (LLMs) to
generate grounded responses by leveraging external knowledge databases without
altering model parameters. Although the absence of weight tuning prevents
leakage via model parameters, it introduces the risk of inference adversaries
exploiting retrieved documents in the model's context. Existing methods for
membership inference and data extraction often rely on jailbreaking or
carefully crafted unnatural queries, which can be easily detected or thwarted
with query rewriting techniques common in RAG systems. In this work, we present
Interrogation Attack (IA), a membership inference technique targeting documents
in the RAG datastore. By crafting natural-text queries that are answerable only
with the target document's presence, our approach demonstrates successful
inference with just 30 queries while remaining stealthy; straightforward
detectors identify adversarial prompts from existing methods up to ~76x more
frequently than those generated by our attack. We observe a 2x improvement in
TPR@1%FPR over prior inference attacks across diverse RAG configurations, all
while costing less than $0.02 per document inference.Summary
AI-Generated Summary