从大型语言模型中提炼人类对齐的隐私敏感度评估框架

摘要

文本数据的精准隐私评估始终是隐私保护自然语言处理领域的核心挑战。近期研究表明，大型语言模型（LLMs）可作为可靠的隐私评估工具，其判断与人类标注高度一致；然而这些模型的计算成本高昂，且难以大规模处理敏感数据，限制了实际应用。为弥补这一缺陷，我们将Mistral Large 3（675B）模型的隐私评估能力蒸馏至参数量仅1.5亿的轻量级编码器模型。通过利用涵盖十大领域的大规模隐私标注文本数据集，我们训练出的高效分类器在保持与人类标注高度一致性的同时，显著降低了计算需求。我们在人类标注的测试数据上验证了该方法，并证明其作为脱敏系统评估指标的实际效用。

English

Accurate privacy evaluation of textual data remains a critical challenge in privacy-preserving natural language processing. Recent work has shown that large language models (LLMs) can serve as reliable privacy evaluators, achieving strong agreement with human judgments; however, their computational cost and impracticality for processing sensitive data at scale limit real-world deployment. We address this gap by distilling the privacy assessment capabilities of Mistral Large 3 (675B) into lightweight encoder models with as few as 150M parameters. Leveraging a large-scale dataset of privacy-annotated texts spanning 10 diverse domains, we train efficient classifiers that preserve strong agreement with human annotations while dramatically reducing computational requirements. We validate our approach on human-annotated test data and demonstrate its practical utility as an evaluation metric for de-identification systems.

从大型语言模型中提炼人类对齐的隐私敏感度评估框架

Distilling Human-Aligned Privacy Sensitivity Assessment from Large Language Models

摘要

Support