TOFU: 大規模言語モデルのための架空の学習解除タスク

要旨

ウェブ上の膨大なデータコーパスで訓練された大規模言語モデルは、機密性の高いデータやプライベートなデータを記憶し再現する可能性があり、法的・倫理的な懸念を引き起こします。アンラーニング、つまりモデルを調整して訓練データに含まれる情報を忘れさせることは、訓練後にプライベートデータを保護する方法を提供します。このようなアンラーニングのためのいくつかの手法が存在しますが、それらがどの程度、忘れるべきデータを最初から学習しなかったモデルと同等の結果をもたらすかは不明です。この課題に対処するため、私たちはTOFU（Task of Fictitious Unlearning）を提案します。これは、アンラーニングの理解を深めるためのベンチマークです。200の多様な合成著者プロファイルからなるデータセットを提供し、各プロファイルは20の質問応答ペアで構成されています。また、これらのプロファイルの一部を「忘れるべきセット」として指定し、アンラーニングの対象とします。アンラーニングの効果を包括的に評価するための一連の指標をまとめ、既存のアンラーニングアルゴリズムによるベースライン結果を提供します。重要なことに、私たちが検討したベースラインのいずれも効果的なアンラーニングを示さず、忘れるべきデータを全く訓練されていないかのようにモデルが振る舞うよう効果的に調整するアンラーニング手法の開発に向けた継続的な努力の必要性が示唆されています。

English

Large language models trained on massive corpora of data from the web can memorize and reproduce sensitive or private data raising both legal and ethical concerns. Unlearning, or tuning models to forget information present in their training data, provides us with a way to protect private data after training. Although several methods exist for such unlearning, it is unclear to what extent they result in models equivalent to those where the data to be forgotten was never learned in the first place. To address this challenge, we present TOFU, a Task of Fictitious Unlearning, as a benchmark aimed at helping deepen our understanding of unlearning. We offer a dataset of 200 diverse synthetic author profiles, each consisting of 20 question-answer pairs, and a subset of these profiles called the forget set that serves as the target for unlearning. We compile a suite of metrics that work together to provide a holistic picture of unlearning efficacy. Finally, we provide a set of baseline results from existing unlearning algorithms. Importantly, none of the baselines we consider show effective unlearning motivating continued efforts to develop approaches for unlearning that effectively tune models so that they truly behave as if they were never trained on the forget data at all.

TOFU: 大規模言語モデルのための架空の学習解除タスク

TOFU: A Task of Fictitious Unlearning for LLMs

要旨

Support