WaterDrum:面向数据驱动遗忘度量的水印技术
WaterDrum: Watermarking for Data-centric Unlearning Metric
May 8, 2025
作者: Xinyang Lu, Xinyuan Niu, Gregory Kang Ruey Lau, Bui Thi Cam Nhung, Rachael Hwee Ling Sim, Fanyu Wen, Chuan-Sheng Foo, See-Kiong Ng, Bryan Kian Hsiang Low
cs.AI
摘要
大语言模型(LLM)的遗忘机制在实际应用中至关重要,尤其是在需要高效移除某些用户私密、受版权保护或有害数据影响的情况下。然而,现有的以模型效用为中心的遗忘评估指标(基于模型效用)在现实场景中可能无法准确衡量遗忘程度,例如当(a)遗忘集与保留集内容语义相似,(b)从头在保留集上重新训练模型不切实际,和/或(c)模型所有者无需直接在LLM上执行遗忘操作即可提升遗忘指标时。本文首次提出了一种名为WaterDrum的数据中心化遗忘评估指标,该指标利用鲁棒的文本水印技术克服上述局限。同时,我们引入了一套新的LLM遗忘基准数据集,这些数据集包含不同相似程度的数据点,可用于通过WaterDrum严格评估遗忘算法。我们的代码可在https://github.com/lululu008/WaterDrum获取,新基准数据集发布于https://huggingface.co/datasets/Glow-AI/WaterDrum-Ax。
English
Large language model (LLM) unlearning is critical in real-world applications
where it is necessary to efficiently remove the influence of private,
copyrighted, or harmful data from some users. However, existing utility-centric
unlearning metrics (based on model utility) may fail to accurately evaluate the
extent of unlearning in realistic settings such as when (a) the forget and
retain set have semantically similar content, (b) retraining the model from
scratch on the retain set is impractical, and/or (c) the model owner can
improve the unlearning metric without directly performing unlearning on the
LLM. This paper presents the first data-centric unlearning metric for LLMs
called WaterDrum that exploits robust text watermarking for overcoming these
limitations. We also introduce new benchmark datasets for LLM unlearning that
contain varying levels of similar data points and can be used to rigorously
evaluate unlearning algorithms using WaterDrum. Our code is available at
https://github.com/lululu008/WaterDrum and our new benchmark datasets are
released at https://huggingface.co/datasets/Glow-AI/WaterDrum-Ax.Summary
AI-Generated Summary