易学难弃：偏见下的稳健遗忘之道

摘要

機器遺忘技術旨在使模型能夠遺忘特定數據，對於保障數據隱私和模型可靠性至關重要。然而在現實場景中，模型可能從數據的虛假相關性中習得非預期偏差，這會嚴重削弱遺忘效能。本文深入探討了從此類帶偏見模型中實施遺忘的獨特挑戰。我們發現了一種稱為「捷徑遺忘」的新現象：模型呈現出「易學難忘」的特性。具體而言，模型難以遺忘易於學習且符合偏差特徵的樣本；它們非但不會遺忘類別屬性，反而會消除偏差屬性，這種悖論性現象甚至可能提升本應被遺忘類別的準確率。為解決此問題，我們提出CUPID框架，其靈感來源於不同偏差特徵的樣本在損失景觀銳度上呈現的差異性。該方法首先根據樣本銳度將待遺忘集劃分為因果近似子集和偏差近似子集，接著將模型參數解耦至因果路徑與偏差路徑，最後通過將精煉後的因果梯度與偏差梯度分別導入對應路徑來實現定向更新。在Waterbirds、BAR和Biased NICO++等帶偏見數據集上的大量實驗表明，我們的方法能實現最優的遺忘性能，並有效緩解捷徑遺忘問題。

English

Machine unlearning, which enables a model to forget specific data, is crucial for ensuring data privacy and model reliability. However, its effectiveness can be severely undermined in real-world scenarios where models learn unintended biases from spurious correlations within the data. This paper investigates the unique challenges of unlearning from such biased models. We identify a novel phenomenon we term ``shortcut unlearning," where models exhibit an ``easy to learn, yet hard to forget" tendency. Specifically, models struggle to forget easily-learned, bias-aligned samples; instead of forgetting the class attribute, they unlearn the bias attribute, which can paradoxically improve accuracy on the class intended to be forgotten. To address this, we propose CUPID, a new unlearning framework inspired by the observation that samples with different biases exhibit distinct loss landscape sharpness. Our method first partitions the forget set into causal- and bias-approximated subsets based on sample sharpness, then disentangles model parameters into causal and bias pathways, and finally performs a targeted update by routing refined causal and bias gradients to their respective pathways. Extensive experiments on biased datasets including Waterbirds, BAR, and Biased NICO++ demonstrate that our method achieves state-of-the-art forgetting performance and effectively mitigates the shortcut unlearning problem.

易学难弃：偏见下的稳健遗忘之道

Easy to Learn, Yet Hard to Forget: Towards Robust Unlearning Under Bias

摘要

Support