遺忘還是保留?邁向大型語言模型的實用知識遺忘
To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models
July 2, 2024
作者: Bozhong Tian, Xiaozhuan Liang, Siyuan Cheng, Qingbin Liu, Mengru Wang, Dianbo Sui, Xi Chen, Huajun Chen, Ningyu Zhang
cs.AI
摘要
在廣泛語料庫上訓練的大型語言模型(LLMs)不可避免地會保留敏感數據,如個人隱私信息和受版權保護的內容。最近在知識遺忘方面的進展包括更新LLM參數以消除特定知識。然而,目前的遺忘範式深陷於模糊的遺忘邊界中,常常會不加區分地消除知識。在這項研究中,我們引入了KnowUnDo,一個包含受版權保護內容和用戶隱私領域的基準,以評估遺忘過程是否意外地消除了基本知識。我們的研究結果表明,現有的遺忘方法往往存在過度遺忘的問題。為了應對這一問題,我們提出了一種簡單而有效的方法,MemFlex,它利用梯度信息來精確地定位並遺忘敏感參數。實驗結果表明,MemFlex在LLMs的精確知識遺忘和一般知識保留方面優於現有方法。代碼和數據集將在https://github.com/zjunlp/KnowUnDo 上發布。
English
Large Language Models (LLMs) trained on extensive corpora inevitably retain
sensitive data, such as personal privacy information and copyrighted material.
Recent advancements in knowledge unlearning involve updating LLM parameters to
erase specific knowledge. However, current unlearning paradigms are mired in
vague forgetting boundaries, often erasing knowledge indiscriminately. In this
work, we introduce KnowUnDo, a benchmark containing copyrighted content and
user privacy domains to evaluate if the unlearning process inadvertently erases
essential knowledge. Our findings indicate that existing unlearning methods
often suffer from excessive unlearning. To address this, we propose a simple
yet effective method, MemFlex, which utilizes gradient information to precisely
target and unlearn sensitive parameters. Experimental results show that MemFlex
is superior to existing methods in both precise knowledge unlearning and
general knowledge retaining of LLMs. Code and dataset will be released at
https://github.com/zjunlp/KnowUnDo.Summary
AI-Generated Summary