쉽게 배우지만 잊히지 않는: 편향 하에서 견고한 망각을 향하여

초록

기계 망각은 모델이 특정 데이터를 잊게 하는 기술로, 데이터 프라이버시와 모델 신뢰성 보장에 중요합니다. 그러나 실제 환경에서는 모델이 데이터 내 편향된 상관관계로부터 의도치 않은 편향을 학습할 수 있어 그 효과성이 크게 저하될 수 있습니다. 본 논문은 이러한 편향된 모델로부터 망각을 수행할 때 발생하는 독특한 과제를 탐구합니다. 우리는 모델이 "쉽게 학습되지만 잊기 어려운" 경향을 보이는 "숏컷 망각" 현상을 새롭게 규명합니다. 구체적으로, 모델은 쉽게 학습된 편향과 일치하는 샘플을 잊는 데 어려움을 겪으며, 대상 클래스 속성을 잊는 대신 편향 속성을 망각하여 역설적으로 잊혀져야 할 클래스에 대한 정확도를 오히려 향상시킵니다. 이를 해결하기 위해 우리는 서로 다른 편향을 가진 샘플이 상이한 손실 경관 민감도를 보인다는 관찰에 착안한 새로운 망각 프레임워크 CUPID를 제안합니다. 우리의 방법은 먼저 샘플의 민감도를 기반으로 망각 집합을 인과-근사 및 편향-근사 하위 집합으로 분할한 후, 모델 매개변수를 인과 경로와 편향 경로로 분리합니다. 마지막으로 정제된 인과 및 편향 기울기를 각각의 경로로 전달하는 대상 갱신을 수행합니다. Waterbirds, BAR, Biased NICO++ 등의 편향된 데이터셋에서 진행한 폭넓은 실험을 통해 우리 방법이 최첨단 망각 성능을 달성하고 숏컷 망각 문제를 효과적으로 완화함을 입증했습니다.

English

Machine unlearning, which enables a model to forget specific data, is crucial for ensuring data privacy and model reliability. However, its effectiveness can be severely undermined in real-world scenarios where models learn unintended biases from spurious correlations within the data. This paper investigates the unique challenges of unlearning from such biased models. We identify a novel phenomenon we term ``shortcut unlearning," where models exhibit an ``easy to learn, yet hard to forget" tendency. Specifically, models struggle to forget easily-learned, bias-aligned samples; instead of forgetting the class attribute, they unlearn the bias attribute, which can paradoxically improve accuracy on the class intended to be forgotten. To address this, we propose CUPID, a new unlearning framework inspired by the observation that samples with different biases exhibit distinct loss landscape sharpness. Our method first partitions the forget set into causal- and bias-approximated subsets based on sample sharpness, then disentangles model parameters into causal and bias pathways, and finally performs a targeted update by routing refined causal and bias gradients to their respective pathways. Extensive experiments on biased datasets including Waterbirds, BAR, and Biased NICO++ demonstrate that our method achieves state-of-the-art forgetting performance and effectively mitigates the shortcut unlearning problem.

쉽게 배우지만 잊히지 않는: 편향 하에서 견고한 망각을 향하여

Easy to Learn, Yet Hard to Forget: Towards Robust Unlearning Under Bias

초록

Support