ReLearn:透過學習實現大型語言模型的反學習
ReLearn: Unlearning via Learning for Large Language Models
February 16, 2025
作者: Haoming Xu, Ningyuan Zhao, Liming Yang, Sendong Zhao, Shumin Deng, Mengru Wang, Bryan Hooi, Nay Oo, Huajun Chen, Ningyu Zhang
cs.AI
摘要
目前針對大型語言模型的遺忘方法通常依賴反向優化來降低目標標記機率。然而,這種範式會干擾後續標記的預測,降低模型性能和語言連貫性。此外,現有的評估指標過分強調情境遺忘,同時未能充分評估回應流暢度和相關性。為應對這些挑戰,我們提出了 ReLearn,這是一個用於有效遺忘的數據擴增和微調流程,以及一個全面的評估框架。該框架引入了知識遺忘率(KFR)和知識保留率(KRR)來衡量知識級別的保留,以及語言分數(LS)來評估生成質量。我們的實驗表明,ReLearn 成功實現了有針對性的遺忘,同時保留了高質量的輸出。通過機制分析,我們進一步展示了反向優化如何干擾連貫文本生成,而 ReLearn 保留了這一基本能力。代碼可在 https://github.com/zjunlp/unlearn 找到。
English
Current unlearning methods for large language models usually rely on reverse
optimization to reduce target token probabilities. However, this paradigm
disrupts the subsequent tokens prediction, degrading model performance and
linguistic coherence. Moreover, existing evaluation metrics overemphasize
contextual forgetting while inadequately assessing response fluency and
relevance. To address these challenges, we propose ReLearn, a data augmentation
and fine-tuning pipeline for effective unlearning, along with a comprehensive
evaluation framework. This framework introduces Knowledge Forgetting Rate (KFR)
and Knowledge Retention Rate (KRR) to measure knowledge-level preservation, and
Linguistic Score (LS) to evaluate generation quality. Our experiments show that
ReLearn successfully achieves targeted forgetting while preserving high-quality
output. Through mechanistic analysis, we further demonstrate how reverse
optimization disrupts coherent text generation, while ReLearn preserves this
essential capability. Code is available at https://github.com/zjunlp/unlearn.Summary
AI-Generated Summary