大型语言模型的实用遗忘

摘要

尽管大型语言模型（LLMs）在各个领域和任务中展现出令人印象深刻的性能，但它们的安全问题变得日益严重。机器遗忘（MU）已经成为一个有前途的解决方案，通过消除不需要数据对目标模型的影响，而不影响其在其他方面的效用，来解决这些问题。MU通常假设可以完全访问原始训练数据以保留效用，但在LLM遗忘中很难实现。现有的LLM遗忘方法通常假设可以访问受不需要数据遗忘影响最严重的数据。然而，这种假设低估了各种LLM能力之间的纠缠，并忽视了由于各种问题而导致的数据访问限制。此外，这些LLM遗忘方法并没有充分考虑到现实场景中不断出现的遗忘请求。为了克服这些挑战并实现实用的LLM遗忘，我们提出了O3框架。O3框架包括一个用于衡量输入和遗忘数据之间相似性的“分布外”（OOD）检测器，以及一个用于持续遗忘请求数据的正交低秩适配器（LoRA）。OOD检测器使用新颖的对比熵损失进行训练，并利用局部-全局层聚合评分机制。正交LoRA实现了在持续遗忘请求之间的参数解缠。在推断过程中，我们的O3框架可以智能地根据OOD检测器的预测决定是否以及在多大程度上加载遗忘LoRA。值得注意的是，O3的有效性不依赖于任何保留数据。我们在三个任务和七个数据集上对O3和最先进的LLM遗忘方法进行了大量实验。结果表明，O3在遗忘效果和效用保留之间始终取得最佳平衡，特别是在面对持续遗忘请求时。

English

While LLMs have demonstrated impressive performance across various domains and tasks, their security issues have become increasingly severe. Machine unlearning (MU) has emerged as a promising solution to address these issues by removing the influence of undesired data on the target model without compromising its utility in other aspects. MU typically assumes full access to the original training data to preserve utility, which is difficult to achieve in LLM unlearning. Existing LLM unlearning methods often assume access to data most affected by undesired data unlearning. However, this assumption underestimates the entanglement among various LLM capabilities and ignores data access limitations due to various issues. Moreover, these LLM unlearning methods do not sufficiently consider that unlearning requests in real-world scenarios are continuously emerging. To overcome these challenges and achieve practical LLM unlearning, we propose the O3 framework. The O3 framework includes an Out-Of-Distribution (OOD) detector to measure the similarity between input and unlearning data, and an Orthogonal low-rank adapter (LoRA) for continuously unlearning requested data. The OOD detector is trained with a novel contrastive entropy loss and utilizes a local-global layer-aggregated scoring mechanism. The orthogonal LoRA achieves parameter disentanglement among continual unlearning requests. During inference, our O3 framework can smartly decide whether and to what extent to load the unlearning LoRA based on the OOD detector's predictions. Notably, O3's effectiveness does not rely on any retained data. We conducted extensive experiments on O3 and state-of-the-art LLM unlearning methods across three tasks and seven datasets. The results indicate that O3 consistently achieves the best trade-off between unlearning effectiveness and utility preservation, especially when facing continuous unlearning requests.

大型语言模型的实用遗忘

Practical Unlearning for Large Language Models

摘要

Support