TOFU：一种用于LLM的虚构遗忘任务

摘要

在大规模语料库数据上训练的大型语言模型可能会记忆和复制敏感或私人数据，引发法律和伦理方面的担忧。遗忘或调整模型以忘记训练数据中存在的信息，为我们提供了一种在训练后保护私人数据的方法。尽管存在多种方法用于这种遗忘，但目前尚不清楚它们在多大程度上会导致与从未学习过要遗忘数据的模型等效的结果。为了解决这一挑战，我们提出了TOFU，即虚构遗忘任务，作为一个旨在帮助加深我们对遗忘的理解的基准。我们提供了一个包含200个不同合成作者资料的数据集，每个资料包含20个问题-答案对，以及这些资料的一个子集，称为遗忘集，用作遗忘的目标。我们编制了一套指标，共同提供了遗忘效果的全面图景。最后，我们提供了现有遗忘算法的一组基准结果。重要的是，我们考虑的所有基准都没有展现出有效的遗忘，这促使我们继续努力开发能够有效调整模型的方法，使其真正表现得好像从未在遗忘数据上进行过训练一样。

English

Large language models trained on massive corpora of data from the web can memorize and reproduce sensitive or private data raising both legal and ethical concerns. Unlearning, or tuning models to forget information present in their training data, provides us with a way to protect private data after training. Although several methods exist for such unlearning, it is unclear to what extent they result in models equivalent to those where the data to be forgotten was never learned in the first place. To address this challenge, we present TOFU, a Task of Fictitious Unlearning, as a benchmark aimed at helping deepen our understanding of unlearning. We offer a dataset of 200 diverse synthetic author profiles, each consisting of 20 question-answer pairs, and a subset of these profiles called the forget set that serves as the target for unlearning. We compile a suite of metrics that work together to provide a holistic picture of unlearning efficacy. Finally, we provide a set of baseline results from existing unlearning algorithms. Importantly, none of the baselines we consider show effective unlearning motivating continued efforts to develop approaches for unlearning that effectively tune models so that they truly behave as if they were never trained on the forget data at all.

TOFU：一种用于LLM的虚构遗忘任务

TOFU: A Task of Fictitious Unlearning for LLMs

摘要

Support