易学难消：偏见下的稳健遗忘之道

摘要

机器遗忘技术旨在使模型能够遗忘特定数据，这对保障数据隐私和模型可靠性至关重要。然而，当模型从数据的伪相关中习得非预期偏差时，该技术在现实场景中的有效性将大打折扣。本文深入探讨了从这类带有偏差的模型中实施遗忘所面临的独特挑战。我们发现了一种名为"捷径遗忘"的新现象：模型呈现出"易学难忘"的特性——模型难以遗忘易习得的偏差对齐样本，不仅未能遗忘类别属性，反而会消除偏差属性，这反而可能反常地提升本应被遗忘类别的准确率。为解决此问题，我们提出CUPID遗忘框架，其灵感来源于不同偏差样本在损失景观锐度上存在差异的发现。该方法首先基于样本锐度将待遗忘集划分为因果近似子集和偏差近似子集，随后将模型参数解耦为因果路径与偏差路径，最终通过将优化后的因果梯度与偏差梯度分别导向对应路径来实现精准参数更新。在Waterbirds、BAR和Biased NICO++等偏差数据集上的大量实验表明，我们的方法实现了最先进的遗忘性能，并有效缓解了捷径遗忘问题。

English

Machine unlearning, which enables a model to forget specific data, is crucial for ensuring data privacy and model reliability. However, its effectiveness can be severely undermined in real-world scenarios where models learn unintended biases from spurious correlations within the data. This paper investigates the unique challenges of unlearning from such biased models. We identify a novel phenomenon we term ``shortcut unlearning," where models exhibit an ``easy to learn, yet hard to forget" tendency. Specifically, models struggle to forget easily-learned, bias-aligned samples; instead of forgetting the class attribute, they unlearn the bias attribute, which can paradoxically improve accuracy on the class intended to be forgotten. To address this, we propose CUPID, a new unlearning framework inspired by the observation that samples with different biases exhibit distinct loss landscape sharpness. Our method first partitions the forget set into causal- and bias-approximated subsets based on sample sharpness, then disentangles model parameters into causal and bias pathways, and finally performs a targeted update by routing refined causal and bias gradients to their respective pathways. Extensive experiments on biased datasets including Waterbirds, BAR, and Biased NICO++ demonstrate that our method achieves state-of-the-art forgetting performance and effectively mitigates the shortcut unlearning problem.