拒絕學習:朝向減輕LLM中的隱私風險
Learning to Refuse: Towards Mitigating Privacy Risks in LLMs
July 14, 2024
作者: Zhenhua Liu, Tong Zhu, Chuanyuan Tan, Wenliang Chen
cs.AI
摘要
大型語言模型(LLMs)展現出在理解和生成自然語言方面的卓越能力。然而,這些模型可能會無意中記憶私人信息,帶來重大的隱私風險。本研究解決了使LLMs能夠保護特定個人私人數據的挑戰,而無需進行完整的重新訓練。我們提出\return,一個真實世界的個人數據取消學習數據集,包括來自維基百科的2,492個個人及其相關的問答對,以評估機器取消學習(MU)方法在實際情況下保護個人數據的效果。此外,我們引入了基於姓名的取消學習框架(NAUF)用於隱私保護,使模型能夠學習應該保護哪些個人信息,而不影響其回答與其他無關個人相關問題的能力。我們的廣泛實驗表明,NAUF實現了最先進的平均取消學習分數,超過最佳基準方法5.65個百分點,有效保護目標個人的個人數據,同時保持模型的通用能力。
English
Large language models (LLMs) exhibit remarkable capabilities in understanding
and generating natural language. However, these models can inadvertently
memorize private information, posing significant privacy risks. This study
addresses the challenge of enabling LLMs to protect specific individuals'
private data without the need for complete retraining. We propose \return, a
Real-world pErsonal daTa UnleaRNing dataset, comprising 2,492 individuals from
Wikipedia with associated QA pairs, to evaluate machine unlearning (MU) methods
for protecting personal data in a realistic scenario. Additionally, we
introduce the Name-Aware Unlearning Framework (NAUF) for Privacy Protection,
which enables the model to learn which individuals' information should be
protected without affecting its ability to answer questions related to other
unrelated individuals. Our extensive experiments demonstrate that NAUF achieves
a state-of-the-art average unlearning score, surpassing the best baseline
method by 5.65 points, effectively protecting target individuals' personal data
while maintaining the model's general capabilities.Summary
AI-Generated Summary