擦除敏感记忆！通过机器遗忘技术消除代码语言模型中的敏感信息

摘要

尽管代码语言模型（CLMs）在代码生成和摘要等软件工程任务中展现了卓越性能，但近期实证研究揭示了一个关键隐私漏洞：这些模型对敏感训练数据存在非预期的记忆，能够在特定提示下逐字重现机密信息。为解决这一问题，已提出多种方法，包括训练数据去重和差分隐私增强。然而，这些方法要求对已部署的CLMs进行全模型重训练，导致巨大的计算成本。本文旨在回答以下研究问题：能否有效且高效地消除CLMs中记忆的敏感信息？我们率先探索了通过机器遗忘（machine unlearning）——一种无需全模型重训练即可从已训练模型中移除特定信息的后处理修改方法——来消除CLMs中的敏感记忆。具体而言，我们首先量化了CLM训练数据集中敏感数据的记忆风险，并精选了一个包含50,000个高风险记忆样本的数据集作为遗忘目标。我们研究了两种广泛使用的基于梯度上升的遗忘方法：基础版和约束版，并引入了CodeEraser，一种高级变体，它选择性地遗忘代码中的敏感记忆片段，同时保持周围代码的结构完整性和功能正确性。在CodeParrot、CodeGen-Mono和Qwen2.5-Coder三个CLM家族上的大量实验验证了CodeEraser在消除目标敏感记忆的同时保持模型效用的有效性和效率。

English

While Code Language Models (CLMs) have demonstrated superior performance in software engineering tasks such as code generation and summarization, recent empirical studies reveal a critical privacy vulnerability: these models exhibit unintended memorization of sensitive training data, enabling verbatim reproduction of confidential information when specifically prompted. To address this issue, several approaches, including training data de-duplication and differential privacy augmentation, have been proposed. However, these methods require full-model retraining for deployed CLMs, which incurs substantial computational costs. In this paper, we aim to answer the following research question: Can sensitive information memorized by CLMs be erased effectively and efficiently? We conduct a pioneering investigation into erasing sensitive memorization in CLMs through machine unlearning - a post-hoc modification method that removes specific information from trained models without requiring full retraining. Specifically, we first quantify the memorization risks of sensitive data within CLM training datasets and curate a high-risk dataset of 50,000 sensitive memorized samples as unlearning targets. We study two widely used gradient ascent-based unlearning approaches: the vanilla and constraint-based methods, and introduce CodeEraser, an advanced variant that selectively unlearns sensitive memorized segments in code while preserving the structural integrity and functional correctness of the surrounding code. Extensive experiments on three families of CLMs, i.e., CodeParrot, CodeGen-Mono, and Qwen2.5-Coder, validate the effectiveness and efficiency of CodeEraser in erasing targeted sensitive memorization while maintaining model utility.

擦除敏感记忆！通过机器遗忘技术消除代码语言模型中的敏感信息

Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning

摘要

Support