TOFU：LLM的虛構遺忘任務

摘要

在大量網絡數據訓練的大型語言模型可能會記憶和重現敏感或私人數據，引發法律和道德上的疑慮。遺忘或調整模型以遺忘訓練數據中存在的信息，為我們提供了一種在訓練後保護私人數據的方法。儘管存在多種方法來進行這種遺忘，但目前尚不清楚這些方法在多大程度上會導致與從未學習過要遺忘的數據的模型等效的結果。為應對這一挑戰，我們提出了TOFU，即虛構遺忘任務，作為一個旨在幫助加深我們對遺忘的理解的基準。我們提供了一個包含200個不同合成作者檔案的數據集，每個檔案包含20個問答對，以及這些檔案的一個子集，被稱為遺忘集，用作遺忘的目標。我們編製了一套指標，共同提供遺忘效果的全面圖景。最後，我們提供了現有遺忘算法的一組基準結果。重要的是，我們考慮的所有基準結果都沒有展示出有效的遺忘，這促使我們繼續努力開發有效調整模型的方法，使其真正表現得好像從未在遺忘數據上進行過訓練一樣。

English

Large language models trained on massive corpora of data from the web can memorize and reproduce sensitive or private data raising both legal and ethical concerns. Unlearning, or tuning models to forget information present in their training data, provides us with a way to protect private data after training. Although several methods exist for such unlearning, it is unclear to what extent they result in models equivalent to those where the data to be forgotten was never learned in the first place. To address this challenge, we present TOFU, a Task of Fictitious Unlearning, as a benchmark aimed at helping deepen our understanding of unlearning. We offer a dataset of 200 diverse synthetic author profiles, each consisting of 20 question-answer pairs, and a subset of these profiles called the forget set that serves as the target for unlearning. We compile a suite of metrics that work together to provide a holistic picture of unlearning efficacy. Finally, we provide a set of baseline results from existing unlearning algorithms. Importantly, none of the baselines we consider show effective unlearning motivating continued efforts to develop approaches for unlearning that effectively tune models so that they truly behave as if they were never trained on the forget data at all.

TOFU：LLM的虛構遺忘任務

TOFU: A Task of Fictitious Unlearning for LLMs

摘要

Support