UPCORE:面向平衡反學習的效用保持核心集選擇
UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning
February 20, 2025
作者: Vaidehi Patil, Elias Stengel-Eskin, Mohit Bansal
cs.AI
摘要
用戶規範或法律框架常要求從預訓練模型中移除特定資訊,包括大型語言模型(LLMs)。這需要從已訓練的模型中刪除或「遺忘」一組數據點,此舉通常會降低模型在其他數據點上的表現。因此,必須在移除資訊與保持模型其他能力之間取得平衡,若未能妥善權衡此取捨,將導致刪除效果不佳或模型無法使用。為此,我們提出了UPCORE(Utility-Preserving Coreset Selection,效用保持的核心集選擇),這是一種方法無關的數據選擇框架,旨在減輕遺忘過程中的附帶損害。我們發現模型損害與模型在遺忘集上表徵的變異性相關,因此選擇性地修剪遺忘集以移除異常值,從而最小化遺忘後的模型性能下降。我們在三個標準的遺忘方法上評估了UPCORE,一致地在刪除效果與模型保留這兩個競爭目標之間達到了更優的平衡。為更好地評估這一取捨,我們引入了一個新指標,通過計算標準指標下的曲線下面積(AUC)來衡量。我們發現UPCORE不僅提升了標準指標,也提高了AUC,這得益於核心集與修剪點之間的正向遷移,同時減少了遺忘集對其外部點的負向遷移。
English
User specifications or legal frameworks often require information to be
removed from pretrained models, including large language models (LLMs). This
requires deleting or "forgetting" a set of data points from an already-trained
model, which typically degrades its performance on other data points. Thus, a
balance must be struck between removing information and keeping the model's
other abilities intact, with a failure to balance this trade-off leading to
poor deletion or an unusable model. To this end, we propose UPCORE
(Utility-Preserving Coreset Selection), a method-agnostic data selection
framework for mitigating collateral damage during unlearning. Finding that the
model damage is correlated with the variance of the model's representations on
the forget set, we selectively prune the forget set to remove outliers, thereby
minimizing model degradation after unlearning. We evaluate UPCORE across three
standard unlearning methods consistently achieving a superior balance between
the competing objectives of deletion efficacy and model preservation. To better
evaluate this trade-off, we introduce a new metric, measuring the
area-under-the-curve (AUC) across standard metrics. We find that UPCORE
improves both standard metrics and AUC, benefitting from positive transfer
between the coreset and pruned points while reducing negative transfer from the
forget set to points outside of it.Summary
AI-Generated Summary