UPCORE：面向平衡反學習的效用保持核心集選擇

摘要

用戶規範或法律框架常要求從預訓練模型中移除特定資訊，包括大型語言模型（LLMs）。這需要從已訓練的模型中刪除或「遺忘」一組數據點，此舉通常會降低模型在其他數據點上的表現。因此，必須在移除資訊與保持模型其他能力之間取得平衡，若未能妥善權衡此取捨，將導致刪除效果不佳或模型無法使用。為此，我們提出了UPCORE（Utility-Preserving Coreset Selection，效用保持的核心集選擇），這是一種方法無關的數據選擇框架，旨在減輕遺忘過程中的附帶損害。我們發現模型損害與模型在遺忘集上表徵的變異性相關，因此選擇性地修剪遺忘集以移除異常值，從而最小化遺忘後的模型性能下降。我們在三個標準的遺忘方法上評估了UPCORE，一致地在刪除效果與模型保留這兩個競爭目標之間達到了更優的平衡。為更好地評估這一取捨，我們引入了一個新指標，通過計算標準指標下的曲線下面積（AUC）來衡量。我們發現UPCORE不僅提升了標準指標，也提高了AUC，這得益於核心集與修剪點之間的正向遷移，同時減少了遺忘集對其外部點的負向遷移。

English

User specifications or legal frameworks often require information to be removed from pretrained models, including large language models (LLMs). This requires deleting or "forgetting" a set of data points from an already-trained model, which typically degrades its performance on other data points. Thus, a balance must be struck between removing information and keeping the model's other abilities intact, with a failure to balance this trade-off leading to poor deletion or an unusable model. To this end, we propose UPCORE (Utility-Preserving Coreset Selection), a method-agnostic data selection framework for mitigating collateral damage during unlearning. Finding that the model damage is correlated with the variance of the model's representations on the forget set, we selectively prune the forget set to remove outliers, thereby minimizing model degradation after unlearning. We evaluate UPCORE across three standard unlearning methods consistently achieving a superior balance between the competing objectives of deletion efficacy and model preservation. To better evaluate this trade-off, we introduce a new metric, measuring the area-under-the-curve (AUC) across standard metrics. We find that UPCORE improves both standard metrics and AUC, benefitting from positive transfer between the coreset and pruned points while reducing negative transfer from the forget set to points outside of it.

UPCORE：面向平衡反學習的效用保持核心集選擇

UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning

摘要

Support