检索增强生成中的知识抽取攻击与防御基准研究
Benchmarking Knowledge-Extraction Attack and Defense on Retrieval-Augmented Generation
February 10, 2026
作者: Zhisheng Qi, Utkarsh Sahu, Li Ma, Haoyu Han, Ryan Rossi, Franck Dernoncourt, Mahantesh Halappanavar, Nesreen Ahmed, Yushun Dong, Yue Zhao, Yu Zhang, Yu Wang
cs.AI
摘要
检索增强生成(RAG)已成为知识密集型应用的核心技术,涵盖企业聊天机器人、医疗辅助系统和智能体记忆管理等领域。然而最新研究表明,知识提取攻击可通过恶意构造的查询恢复知识库中的敏感内容,引发对知识产权窃取与隐私泄露的严重担忧。尽管已有研究探索了独立的攻防技术,但该领域研究仍呈碎片化状态,涉及异构检索嵌入模型、多样化生成模型,以及基于非标准化指标和不一致数据集的评估方法。为弥补这一空白,我们首次构建了针对RAG系统知识提取攻击的系统性基准测试框架。该基准涵盖广泛的攻防策略、代表性检索嵌入模型、开源与闭源生成器,并在统一实验框架下通过标准化协议对多数据集进行评估。通过整合实验环境并实现可复现、可比较的评估,本基准为应对新兴知识提取威胁、开发隐私保护型RAG系统提供了可行洞见与实践基础。相关代码已开源发布。
English
Retrieval-Augmented Generation (RAG) has become a cornerstone of knowledge-intensive applications, including enterprise chatbots, healthcare assistants, and agentic memory management. However, recent studies show that knowledge-extraction attacks can recover sensitive knowledge-base content through maliciously crafted queries, raising serious concerns about intellectual property theft and privacy leakage. While prior work has explored individual attack and defense techniques, the research landscape remains fragmented, spanning heterogeneous retrieval embeddings, diverse generation models, and evaluations based on non-standardized metrics and inconsistent datasets. To address this gap, we introduce the first systematic benchmark for knowledge-extraction attacks on RAG systems. Our benchmark covers a broad spectrum of attack and defense strategies, representative retrieval embedding models, and both open- and closed-source generators, all evaluated under a unified experimental framework with standardized protocols across multiple datasets. By consolidating the experimental landscape and enabling reproducible, comparable evaluation, this benchmark provides actionable insights and a practical foundation for developing privacy-preserving RAG systems in the face of emerging knowledge extraction threats. Our code is available here.