Unilogit：基于统一目标自蒸馏的大语言模型鲁棒性机器遗忘方法

摘要

本文介绍了Unilogit，一种用于大语言模型机器遗忘的新型自蒸馏方法。Unilogit解决了在保持模型整体效用的同时选择性遗忘特定信息的挑战，这是遵守GDPR等数据隐私法规的关键任务。与依赖静态超参数或初始模型输出的现有方法不同，Unilogit动态调整目标logits，以实现目标token的均匀概率分布，利用当前模型输出来获得更精确的自蒸馏目标。这种方法不仅消除了对额外超参数的需求，还增强了模型逼近理想目标的能力。在公开基准和内部电商数据集上的大量实验表明，Unilogit在平衡遗忘与保留目标方面表现出色，超越了NPO和UnDIAL等最先进方法。我们的分析进一步揭示了Unilogit在各种场景下的鲁棒性，突显了其在实现高效机器遗忘方面的实际适用性和有效性。

English

This paper introduces Unilogit, a novel self-distillation method for machine unlearning in Large Language Models. Unilogit addresses the challenge of selectively forgetting specific information while maintaining overall model utility, a critical task in compliance with data privacy regulations like GDPR. Unlike prior methods that rely on static hyperparameters or starting model outputs, Unilogit dynamically adjusts target logits to achieve a uniform probability for the target token, leveraging the current model's outputs for more accurate self-distillation targets. This approach not only eliminates the need for additional hyperparameters but also enhances the model's ability to approximate the golden targets. Extensive experiments on public benchmarks and an in-house e-commerce dataset demonstrate Unilogit's superior performance in balancing forget and retain objectives, outperforming state-of-the-art methods such as NPO and UnDIAL. Our analysis further reveals Unilogit's robustness across various scenarios, highlighting its practical applicability and effectiveness in achieving efficacious machine unlearning.

Unilogit：基于统一目标自蒸馏的大语言模型鲁棒性机器遗忘方法

Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation

摘要

Support