ZeroUnlearn：大型語言模型中的少樣本知識遺忘

摘要

大型語言模型因在大規模網路語料庫上訓練，不可避免地會保留敏感資訊（定義為可能引發有害生成的輸入），引發隱私與安全方面的擔憂。現有的機器遺忘方法主要依賴重新訓練或激進的微調，但這兩種方式不是計算成本高昂，就是容易導致相關知識與整體模型效用的退化。在本研究中，我們將機器遺忘重新表述為透過模型編輯進行精確知識重映射的問題。我們提出ZeroUnlearn，一個少量樣本遺忘框架。該框架透過將敏感輸入映射至中性目標狀態並移除其原始表徵，來覆寫這些輸入。ZeroUnlearn透過具封閉式解的多重參數更新來強制執行表徵正交性，從而實現高效且具針對性的遺忘。我們進一步將ZeroUnlearn擴展為基於梯度的變體，以處理多樣本遺忘任務。實驗結果表明，我們的方法在優於現有基準方法的同時，保留了模型的通用效用。我們的程式碼已開放於GitHub：https://github.com/XMUDeepLIT/ZeroUnlearn。

English

Large language models inevitably retain sensitive information, defined as inputs that may induce harmful generations, due to training on massive web corpora, raising concerns for privacy and safety. Existing machine unlearning methods primarily rely on retraining or aggressive fine-tuning, which are either computationally expensive or prone to degrading related knowledge and overall model utility. In this work, we reformulate machine unlearning as a precise knowledge re-mapping problem via model editing. We propose ZeroUnlearn, a few-shot unlearning framework. It overwrites sensitive inputs by mapping them to a neutral target state and removing their original representations. ZeroUnlearn enforces representational orthogonality through a multiplicative parameter update with a closed-form solution, enabling efficient and targeted unlearning. We further extend ZeroUnlearn to a gradient-based variant for multi-sample unlearning. Experiments demonstrate that our approach outperforms existing baselines while preserving general model utility. Our code is available at the github: https://github.com/XMUDeepLIT/ZeroUnlearn.