大型語言模型的實用性遺忘

摘要

儘管大型語言模型（LLMs）在各個領域和任務中展現出令人印象深刻的性能，但它們的安全問題變得日益嚴重。機器遺忘（MU）已經成為一個有前途的解決方案，通過消除不需要的數據對目標模型的影響，而不損害其在其他方面的效用，來應對這些問題。MU通常假設可以完全訪問原始訓練數據以保留效用，但在LLM遺忘中實現這一點是困難的。現有的LLM遺忘方法通常假設可以訪問受不需要的數據遺忘影響最大的數據。然而，這種假設低估了各種LLM能力之間的交織，並忽略了由於各種問題而導致的數據訪問限制。此外，這些LLM遺忘方法並未充分考慮現實場景中不斷出現的遺忘請求。為了克服這些挑戰並實現實用的LLM遺忘，我們提出了O3框架。O3框架包括一個用於測量輸入和遺忘數據之間相似性的「分布之外」（OOD）檢測器，以及一個用於持續遺忘請求數據的正交低秩適配器（LoRA）。OOD檢測器使用新穎的對比熵損失進行訓練，並利用局部-全局層聚合的評分機制。正交LoRA實現了在持續遺忘請求之間的參數解耦。在推論期間，我們的O3框架可以智能地根據OOD檢測器的預測來決定是否以及在多大程度上加載遺忘LoRA。值得注意的是，O3的有效性不依賴任何保留的數據。我們在三個任務和七個數據集上對O3和最先進的LLM遺忘方法進行了廣泛實驗。結果表明，O3在遺忘效果和保留效用之間始終取得最佳平衡，特別是在面對持續遺忘請求時。

English

While LLMs have demonstrated impressive performance across various domains and tasks, their security issues have become increasingly severe. Machine unlearning (MU) has emerged as a promising solution to address these issues by removing the influence of undesired data on the target model without compromising its utility in other aspects. MU typically assumes full access to the original training data to preserve utility, which is difficult to achieve in LLM unlearning. Existing LLM unlearning methods often assume access to data most affected by undesired data unlearning. However, this assumption underestimates the entanglement among various LLM capabilities and ignores data access limitations due to various issues. Moreover, these LLM unlearning methods do not sufficiently consider that unlearning requests in real-world scenarios are continuously emerging. To overcome these challenges and achieve practical LLM unlearning, we propose the O3 framework. The O3 framework includes an Out-Of-Distribution (OOD) detector to measure the similarity between input and unlearning data, and an Orthogonal low-rank adapter (LoRA) for continuously unlearning requested data. The OOD detector is trained with a novel contrastive entropy loss and utilizes a local-global layer-aggregated scoring mechanism. The orthogonal LoRA achieves parameter disentanglement among continual unlearning requests. During inference, our O3 framework can smartly decide whether and to what extent to load the unlearning LoRA based on the OOD detector's predictions. Notably, O3's effectiveness does not rely on any retained data. We conducted extensive experiments on O3 and state-of-the-art LLM unlearning methods across three tasks and seven datasets. The results indicate that O3 consistently achieves the best trade-off between unlearning effectiveness and utility preservation, especially when facing continuous unlearning requests.

大型語言模型的實用性遺忘

Practical Unlearning for Large Language Models

摘要

Support