대규모 언어 모델에서 인간 정렬 프라이버시 민감도 평가 추출하기

초록

텍스트 데이터의 정확한 프라이버시 평가는 프라이버시 보호 자연어 처리 분야에서 여전히 중요한 과제로 남아 있습니다. 최근 연구에서는 대규모 언어 모델(LLM)이 인간의 판단과 높은 일치도를 보이며 신뢰할 수 있는 프라이버시 평가자 역할을 할 수 있음이 입증되었으나, 이들의 계산 비용과 대규모 민감 데이터 처리의 비현실성으로 인해 실제 적용에는 한계가 있습니다. 본 연구은 이러한 격차를 해결하기 위해 Mistral Large 3(675B)의 프라이버시 평가 능력을 1억 5천만 개의 매개변수만을 가진 경량 인코더 모델로 지식 증류합니다. 10개의 다양한 도메인을 아우르는 대규모 프라이버시 주석 텍스트 데이터셋을 활용하여 계산 요구 사항을 극적으로 줄이면서도 인간 주석과의 강력한 일치도를 유지하는 효율적인 분류기를 학습합니다. 우리는 인간이 주석을 단 테스트 데이터를 통해 본 접근법을 검증하고, 비식별화 시스템을 위한 평가 지표로서의 실용적인 유용성을 입증합니다.

English

Accurate privacy evaluation of textual data remains a critical challenge in privacy-preserving natural language processing. Recent work has shown that large language models (LLMs) can serve as reliable privacy evaluators, achieving strong agreement with human judgments; however, their computational cost and impracticality for processing sensitive data at scale limit real-world deployment. We address this gap by distilling the privacy assessment capabilities of Mistral Large 3 (675B) into lightweight encoder models with as few as 150M parameters. Leveraging a large-scale dataset of privacy-annotated texts spanning 10 diverse domains, we train efficient classifiers that preserve strong agreement with human annotations while dramatically reducing computational requirements. We validate our approach on human-annotated test data and demonstrate its practical utility as an evaluation metric for de-identification systems.

대규모 언어 모델에서 인간 정렬 프라이버시 민감도 평가 추출하기

Distilling Human-Aligned Privacy Sensitivity Assessment from Large Language Models

초록

Support