UPCORE: バランスドアンラーニングのための効用保存型コアセット選択

要旨

ユーザー仕様や法的枠組みでは、事前学習済みモデル、特に大規模言語モデル（LLM）から情報を削除する必要が生じることがあります。これには、既に訓練済みのモデルから一連のデータポイントを削除または「忘却」させることが含まれますが、これにより他のデータポイントに対するモデルの性能が低下するのが一般的です。したがって、情報の削除とモデルの他の能力の維持の間でバランスを取る必要があり、このトレードオフを適切に調整できないと、削除が不十分になるか、モデルが使用不能になる可能性があります。この目的のために、我々はUPCORE（Utility-Preserving Coreset Selection）を提案します。これは、忘却時の副次的ダメージを軽減するための手法に依存しないデータ選択フレームワークです。モデルのダメージが忘却セットに対するモデルの表現の分散と相関していることを発見し、我々は忘却セットから外れ値を選択的に除去することで、忘却後のモデルの劣化を最小限に抑えます。UPCOREを3つの標準的な忘却手法で評価し、削除効果とモデル保存の競合する目的の間で優れたバランスを一貫して達成することを確認しました。このトレードオフをより適切に評価するために、標準的な指標にわたる曲線下面積（AUC）を測定する新しい指標を導入しました。UPCOREは、標準的な指標とAUCの両方を改善し、コアセットと除去されたポイント間の正の転移の恩恵を受けながら、忘却セットからそれ以外のポイントへの負の転移を減少させることがわかりました。

English

User specifications or legal frameworks often require information to be removed from pretrained models, including large language models (LLMs). This requires deleting or "forgetting" a set of data points from an already-trained model, which typically degrades its performance on other data points. Thus, a balance must be struck between removing information and keeping the model's other abilities intact, with a failure to balance this trade-off leading to poor deletion or an unusable model. To this end, we propose UPCORE (Utility-Preserving Coreset Selection), a method-agnostic data selection framework for mitigating collateral damage during unlearning. Finding that the model damage is correlated with the variance of the model's representations on the forget set, we selectively prune the forget set to remove outliers, thereby minimizing model degradation after unlearning. We evaluate UPCORE across three standard unlearning methods consistently achieving a superior balance between the competing objectives of deletion efficacy and model preservation. To better evaluate this trade-off, we introduce a new metric, measuring the area-under-the-curve (AUC) across standard metrics. We find that UPCORE improves both standard metrics and AUC, benefitting from positive transfer between the coreset and pruned points while reducing negative transfer from the forget set to points outside of it.

UPCORE: バランスドアンラーニングのための効用保存型コアセット選択

UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning

要旨

Support