从大语言模型中提炼人类对齐的隐私敏感度评估方法
Distilling Human-Aligned Privacy Sensitivity Assessment from Large Language Models
March 31, 2026
作者: Gabriel Loiseau, Damien Sileo, Damien Riquet, Maxime Meyer, Marc Tommasi
cs.AI
摘要
文本数据的精准隐私评估始终是隐私保护自然语言处理领域的核心挑战。最新研究表明,大型语言模型可作为可靠的隐私评估工具,其判断与人类标注高度一致;然而,其高昂的计算成本及处理敏感数据时的大规模实操限制,阻碍了实际应用。针对这一缺陷,我们将Mistral Large 3(675B参数)的隐私评估能力蒸馏至仅需1.5亿参数的轻量级编码模型。基于涵盖10个不同领域的大规模隐私标注文本数据集,我们训练出既能保持与人类标注高度一致性,又可显著降低计算需求的高效分类器。通过人类标注测试数据的验证,我们证明了该方法作为脱敏系统评估指标的实际应用价值。
English
Accurate privacy evaluation of textual data remains a critical challenge in privacy-preserving natural language processing. Recent work has shown that large language models (LLMs) can serve as reliable privacy evaluators, achieving strong agreement with human judgments; however, their computational cost and impracticality for processing sensitive data at scale limit real-world deployment. We address this gap by distilling the privacy assessment capabilities of Mistral Large 3 (675B) into lightweight encoder models with as few as 150M parameters. Leveraging a large-scale dataset of privacy-annotated texts spanning 10 diverse domains, we train efficient classifiers that preserve strong agreement with human annotations while dramatically reducing computational requirements. We validate our approach on human-annotated test data and demonstrate its practical utility as an evaluation metric for de-identification systems.