大規模言語モデルからの人間整合型プライバシー感度評価の蒸留

要旨

テキストデータの正確なプライバシー評価は、プライバシー保護を考慮した自然言語処理における重要な課題である。最近の研究では、大規模言語モデル（LLM）が信頼性の高いプライバシー評価器として機能し、人間の判断との高い一致を示すことが明らかになっている。しかし、その計算コストと、機密データを大規模に処理する現実的な非実用性が、実世界での導入を制限している。本研究はこの課題を解決するため、Mistral Large 3（675B）のプライバシー評価能力を、わずか1億5000万パラメータという軽量なエンコーダモデルへと蒸留する。10の多様な領域にわたる大規模なプライバシー注釈付きテキストデータセットを活用し、計算要求を劇的に削減しながら、人間の注釈との強い一致を維持する効率的な分類器を訓練する。本手法を人間が注釈したテストデータで検証し、匿名化システムの評価指標としての実用性を実証する。

English

Accurate privacy evaluation of textual data remains a critical challenge in privacy-preserving natural language processing. Recent work has shown that large language models (LLMs) can serve as reliable privacy evaluators, achieving strong agreement with human judgments; however, their computational cost and impracticality for processing sensitive data at scale limit real-world deployment. We address this gap by distilling the privacy assessment capabilities of Mistral Large 3 (675B) into lightweight encoder models with as few as 150M parameters. Leveraging a large-scale dataset of privacy-annotated texts spanning 10 diverse domains, we train efficient classifiers that preserve strong agreement with human annotations while dramatically reducing computational requirements. We validate our approach on human-annotated test data and demonstrate its practical utility as an evaluation metric for de-identification systems.

大規模言語モデルからの人間整合型プライバシー感度評価の蒸留

Distilling Human-Aligned Privacy Sensitivity Assessment from Large Language Models

要旨

Support