拒绝学习:减轻LLM中的隐私风险
Learning to Refuse: Towards Mitigating Privacy Risks in LLMs
July 14, 2024
作者: Zhenhua Liu, Tong Zhu, Chuanyuan Tan, Wenliang Chen
cs.AI
摘要
大型语言模型(LLMs)在理解和生成自然语言方面表现出卓越的能力。然而,这些模型可能会无意中记忆私人信息,带来重大的隐私风险。本研究解决了使LLMs能够保护特定个人私人数据的挑战,而无需完全重新训练的问题。我们提出了一个名为\return 的真实世界个人数据去除(MU)数据集,包括来自维基百科的2,492个个人及其相关的问答对,用于评估在现实场景中保护个人数据的机器去除方法。此外,我们引入了基于姓名的去除框架(NAUF)用于隐私保护,使模型能够学习应保护哪些个人信息,而不影响其回答与其他不相关个人相关的问题的能力。我们的广泛实验表明,NAUF实现了最先进的平均去除分数,超过最佳基准方法5.65个点,有效保护目标个人的个人数据,同时保持模型的通用能力。
English
Large language models (LLMs) exhibit remarkable capabilities in understanding
and generating natural language. However, these models can inadvertently
memorize private information, posing significant privacy risks. This study
addresses the challenge of enabling LLMs to protect specific individuals'
private data without the need for complete retraining. We propose \return, a
Real-world pErsonal daTa UnleaRNing dataset, comprising 2,492 individuals from
Wikipedia with associated QA pairs, to evaluate machine unlearning (MU) methods
for protecting personal data in a realistic scenario. Additionally, we
introduce the Name-Aware Unlearning Framework (NAUF) for Privacy Protection,
which enables the model to learn which individuals' information should be
protected without affecting its ability to answer questions related to other
unrelated individuals. Our extensive experiments demonstrate that NAUF achieves
a state-of-the-art average unlearning score, surpassing the best baseline
method by 5.65 points, effectively protecting target individuals' personal data
while maintaining the model's general capabilities.Summary
AI-Generated Summary