WaterDrum：面向数据去学习度量的水印技术

摘要

大型語言模型（LLM）的遺忘技術在實際應用中至關重要，尤其是在需要高效移除某些用戶的私人、受版權保護或有害數據影響的情況下。然而，現有的以效用為中心的遺忘評估指標（基於模型效用）在現實場景中可能無法準確評估遺忘的程度，例如當（a）遺忘集和保留集的內容在語義上相似，（b）從頭開始在保留集上重新訓練模型不切實際，以及/或（c）模型所有者可以在不直接對LLM進行遺忘操作的情況下提升遺忘指標。本文提出了首個以數據為中心的LLM遺忘評估指標——WaterDrum，該指標利用魯棒的文本水印技術來克服這些限制。我們還引入了新的LLM遺忘基準數據集，這些數據集包含不同相似程度的數據點，可用於通過WaterDrum嚴格評估遺忘算法。我們的代碼可在https://github.com/lululu008/WaterDrum獲取，新的基準數據集已發佈於https://huggingface.co/datasets/Glow-AI/WaterDrum-Ax。

English

Large language model (LLM) unlearning is critical in real-world applications where it is necessary to efficiently remove the influence of private, copyrighted, or harmful data from some users. However, existing utility-centric unlearning metrics (based on model utility) may fail to accurately evaluate the extent of unlearning in realistic settings such as when (a) the forget and retain set have semantically similar content, (b) retraining the model from scratch on the retain set is impractical, and/or (c) the model owner can improve the unlearning metric without directly performing unlearning on the LLM. This paper presents the first data-centric unlearning metric for LLMs called WaterDrum that exploits robust text watermarking for overcoming these limitations. We also introduce new benchmark datasets for LLM unlearning that contain varying levels of similar data points and can be used to rigorously evaluate unlearning algorithms using WaterDrum. Our code is available at https://github.com/lululu008/WaterDrum and our new benchmark datasets are released at https://huggingface.co/datasets/Glow-AI/WaterDrum-Ax.

WaterDrum：面向数据去学习度量的水印技术

WaterDrum: Watermarking for Data-centric Unlearning Metric

摘要

Summary

Support

Support