圖像對圖像生成模型的機器遺忘
Machine Unlearning for Image-to-Image Generative Models
February 1, 2024
作者: Guihong Li, Hsiang Hsu, Chun-Fu, Chen, Radu Marculescu
cs.AI
摘要
機器遺忘已經成為一種新的範式,可以有意地從給定模型中遺忘數據樣本,以符合嚴格的法規要求。然而,現有的機器遺忘方法主要集中在分類模型上,對於生成模型的遺忘領域相對未被探索。本文作為一座橋樑,填補了這一空白,提供了一個統一的機器遺忘框架,針對圖像生成模型。在這個框架內,我們提出了一種計算效率高的算法,基於嚴謹的理論分析,證明在保留樣本上表現幾乎沒有下降,同時有效地從遺忘樣本中刪除信息。對於兩個大規模數據集ImageNet-1K和Places-365的實證研究進一步表明,我們的算法不依賴於保留樣本的可用性,進一步符合數據保留政策。據我們所知,這項工作是首次對專門針對圖像生成模型的機器遺忘進行系統性、理論性和實證性探索。我們的代碼可在https://github.com/jpmorganchase/l2l-generator-unlearning找到。
English
Machine unlearning has emerged as a new paradigm to deliberately forget data
samples from a given model in order to adhere to stringent regulations.
However, existing machine unlearning methods have been primarily focused on
classification models, leaving the landscape of unlearning for generative
models relatively unexplored. This paper serves as a bridge, addressing the gap
by providing a unifying framework of machine unlearning for image-to-image
generative models. Within this framework, we propose a
computationally-efficient algorithm, underpinned by rigorous theoretical
analysis, that demonstrates negligible performance degradation on the retain
samples, while effectively removing the information from the forget samples.
Empirical studies on two large-scale datasets, ImageNet-1K and Places-365,
further show that our algorithm does not rely on the availability of the retain
samples, which further complies with data retention policy. To our best
knowledge, this work is the first that represents systemic, theoretical,
empirical explorations of machine unlearning specifically tailored for
image-to-image generative models. Our code is available at
https://github.com/jpmorganchase/l2l-generator-unlearning.