概念感知隐私机制:防御嵌入反演攻击的防护策略
Concept-Aware Privacy Mechanisms for Defending Embedding Inversion Attacks
February 6, 2026
作者: Yu-Che Tsai, Hsiang Hsiao, Kuan-Yu Chen, Shou-De Lin
cs.AI
摘要
文本嵌入技术虽赋能众多自然语言处理应用,却面临嵌入反演攻击带来的严重隐私风险,可能导致敏感属性泄露或原始文本重构。现有差分隐私防御方案假设嵌入维度具有均匀敏感性,致使噪声添加过量且效用受损。我们提出SPARSE这一面向用户的文本嵌入概念级隐私保护框架,其融合两大创新:(1) 通过可微分掩码学习识别用户自定义概念的隐私敏感维度;(2) 采用马氏机制施加基于维度敏感度校准的椭球形噪声。相较于传统球面噪声注入方法,SPARSE选择性地扰动隐私敏感维度,同时保留非敏感语义。在六大数据集、三种嵌入模型及多类攻击场景下的评估表明,SPARSE在持续降低隐私泄露风险的同时,其下游任务性能显著优于当前最先进的差分隐私方法。
English
Text embeddings enable numerous NLP applications but face severe privacy risks from embedding inversion attacks, which can expose sensitive attributes or reconstruct raw text. Existing differential privacy defenses assume uniform sensitivity across embedding dimensions, leading to excessive noise and degraded utility. We propose SPARSE, a user-centric framework for concept-specific privacy protection in text embeddings. SPARSE combines (1) differentiable mask learning to identify privacy-sensitive dimensions for user-defined concepts, and (2) the Mahalanobis mechanism that applies elliptical noise calibrated by dimension sensitivity. Unlike traditional spherical noise injection, SPARSE selectively perturbs privacy-sensitive dimensions while preserving non-sensitive semantics. Evaluated across six datasets with three embedding models and attack scenarios, SPARSE consistently reduces privacy leakage while achieving superior downstream performance compared to state-of-the-art DP methods.