UPCORE: 균형 잡힌 언러닝을 위한 유틸리티 보존 코어셋 선택

초록

사용자 요구사항이나 법적 프레임워크는 종종 사전 학습된 모델, 특히 대규모 언어 모델(LLMs)에서 특정 정보를 제거할 것을 요구합니다. 이는 이미 학습된 모델에서 일련의 데이터 포인트를 삭제하거나 "잊게" 하는 것을 의미하며, 일반적으로 이는 다른 데이터 포인트에 대한 모델의 성능을 저하시킵니다. 따라서 정보 제거와 모델의 다른 능력을 유지하는 사이의 균형을 맞추어야 하며, 이러한 균형을 맞추지 못하면 정보 삭제가 제대로 이루어지지 않거나 모델이 사용 불가능해질 수 있습니다. 이를 위해 우리는 UPCORE(Utility-Preserving Coreset Selection)를 제안합니다. 이는 언러닝(unlearning) 과정 중 발생하는 부수적 손상을 완화하기 위한 방법론에 구애받지 않는 데이터 선택 프레임워크입니다. 모델 손상이 잊혀질 데이터 세트(forget set)에 대한 모델 표현의 분산과 상관관계가 있음을 발견한 우리는, 잊혀질 데이터 세트에서 이상치를 선택적으로 제거하여 언러닝 후 모델 성능 저하를 최소화합니다. 우리는 UPCORE를 세 가지 표준 언러닝 방법에 걸쳐 평가하며, 삭제 효율성과 모델 보존이라는 상충되는 목표 사이에서 우수한 균형을 달성함을 확인했습니다. 이러한 균형을 더 잘 평가하기 위해, 우리는 표준 지표들에 대한 곡선 아래 면적(AUC)을 측정하는 새로운 지표를 도입했습니다. UPCORE는 표준 지표와 AUC 모두를 개선하며, 코어셋(core set)과 제거된 포인트 간의 긍정적 전이(positive transfer)로부터 이점을 얻는 동시에 잊혀질 데이터 세트가 그 외의 포인트에 미치는 부정적 전이(negative transfer)를 줄입니다.

English

User specifications or legal frameworks often require information to be removed from pretrained models, including large language models (LLMs). This requires deleting or "forgetting" a set of data points from an already-trained model, which typically degrades its performance on other data points. Thus, a balance must be struck between removing information and keeping the model's other abilities intact, with a failure to balance this trade-off leading to poor deletion or an unusable model. To this end, we propose UPCORE (Utility-Preserving Coreset Selection), a method-agnostic data selection framework for mitigating collateral damage during unlearning. Finding that the model damage is correlated with the variance of the model's representations on the forget set, we selectively prune the forget set to remove outliers, thereby minimizing model degradation after unlearning. We evaluate UPCORE across three standard unlearning methods consistently achieving a superior balance between the competing objectives of deletion efficacy and model preservation. To better evaluate this trade-off, we introduce a new metric, measuring the area-under-the-curve (AUC) across standard metrics. We find that UPCORE improves both standard metrics and AUC, benefitting from positive transfer between the coreset and pruned points while reducing negative transfer from the forget set to points outside of it.

UPCORE: 균형 잡힌 언러닝을 위한 유틸리티 보존 코어셋 선택

UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning

초록

Support