ChatPaper.aiChatPaper

反学习:在先进生成人工智能中,仅仅进行反学习并不足以实现内容规范。

UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI

June 27, 2024
作者: Ilia Shumailov, Jamie Hayes, Eleni Triantafillou, Guillermo Ortiz-Jimenez, Nicolas Papernot, Matthew Jagielski, Itay Yona, Heidi Howard, Eugene Bagdasaryan
cs.AI

摘要

精确反学习最初被引入作为一种隐私机制,允许用户根据请求从机器学习模型中撤回其数据。不久之后,提出了不精确方案以缓解与精确反学习相关的不切实际成本。最近,反学习经常被讨论作为一种用于移除不允许的知识的方法,即模型不应该拥有的知识,例如未经许可的版权、不准确或恶意信息。承诺是,如果模型没有某种恶意能力,那么它就无法用于相关的恶意目的。在本文中,我们重新审视了反学习在大型语言模型(LLMs)中的应用范式,并突出了由于上下文学习而产生的潜在不一致性。反学习可以作为训练阶段的有效控制机制,但它无法阻止模型在推断过程中执行不允许的行为。我们引入了反反学习的概念,其中被反学习的知识在上下文中重新引入,有效地使模型能够表现得好像它知道被遗忘的知识一样。因此,我们认为将需要对不允许的知识进行内容过滤,即使是精确反学习方案也不足以实现有效的内容监管。我们讨论了将反反学习应用于现代LLMs的可行性,并检验了更广泛的影响。
English
Exact unlearning was first introduced as a privacy mechanism that allowed a user to retract their data from machine learning models on request. Shortly after, inexact schemes were proposed to mitigate the impractical costs associated with exact unlearning. More recently unlearning is often discussed as an approach for removal of impermissible knowledge i.e. knowledge that the model should not possess such as unlicensed copyrighted, inaccurate, or malicious information. The promise is that if the model does not have a certain malicious capability, then it cannot be used for the associated malicious purpose. In this paper we revisit the paradigm in which unlearning is used for in Large Language Models (LLMs) and highlight an underlying inconsistency arising from in-context learning. Unlearning can be an effective control mechanism for the training phase, yet it does not prevent the model from performing an impermissible act during inference. We introduce a concept of ununlearning, where unlearned knowledge gets reintroduced in-context, effectively rendering the model capable of behaving as if it knows the forgotten knowledge. As a result, we argue that content filtering for impermissible knowledge will be required and even exact unlearning schemes are not enough for effective content regulation. We discuss feasibility of ununlearning for modern LLMs and examine broader implications.

Summary

AI-Generated Summary

PDF61November 28, 2024