UnUnlearning: 高度な生成AIにおけるコンテンツ規制には、アンラーニングだけでは不十分である

要旨

正確な忘却（Exact unlearning）は、ユーザーが要求に応じて機械学習モデルから自身のデータを撤回できるプライバシー機構として最初に導入されました。その後まもなく、正確な忘却に関連する非現実的なコストを軽減するために、不正確なスキームが提案されました。最近では、忘却は、モデルが持つべきでない知識（例えば、無許可の著作権物、不正確な情報、または悪意のある情報）を除去するアプローチとしてしばしば議論されています。その約束は、モデルが特定の悪意のある能力を持たない場合、それに関連する悪意のある目的に使用できないというものです。本論文では、大規模言語モデル（LLM）において忘却が使用されるパラダイムを再検討し、文脈内学習（in-context learning）から生じる根本的な矛盾を指摘します。忘却は訓練段階における効果的な制御機構となり得ますが、推論段階でモデルが許容されない行動を実行することを防ぐことはできません。我々は「忘却の解除（ununlearning）」という概念を導入します。これは、忘却された知識が文脈内で再導入され、モデルがその知識を知っているかのように振る舞うことができるようになる現象です。その結果、許容されない知識に対するコンテンツフィルタリングが必要であり、正確な忘却スキームでさえ効果的なコンテンツ規制には不十分であると主張します。我々は、現代のLLMにおける忘却の解除の実現可能性を議論し、より広範な影響を検討します。

English

Exact unlearning was first introduced as a privacy mechanism that allowed a user to retract their data from machine learning models on request. Shortly after, inexact schemes were proposed to mitigate the impractical costs associated with exact unlearning. More recently unlearning is often discussed as an approach for removal of impermissible knowledge i.e. knowledge that the model should not possess such as unlicensed copyrighted, inaccurate, or malicious information. The promise is that if the model does not have a certain malicious capability, then it cannot be used for the associated malicious purpose. In this paper we revisit the paradigm in which unlearning is used for in Large Language Models (LLMs) and highlight an underlying inconsistency arising from in-context learning. Unlearning can be an effective control mechanism for the training phase, yet it does not prevent the model from performing an impermissible act during inference. We introduce a concept of ununlearning, where unlearned knowledge gets reintroduced in-context, effectively rendering the model capable of behaving as if it knows the forgotten knowledge. As a result, we argue that content filtering for impermissible knowledge will be required and even exact unlearning schemes are not enough for effective content regulation. We discuss feasibility of ununlearning for modern LLMs and examine broader implications.

UnUnlearning: 高度な生成AIにおけるコンテンツ規制には、アンラーニングだけでは不十分である

UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI

要旨

Support