徹底清除!透過機器遺忘技術消除程式語言模型中的敏感記憶
Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning
September 17, 2025
作者: Zhaoyang Chu, Yao Wan, Zhikun Zhang, Di Wang, Zhou Yang, Hongyu Zhang, Pan Zhou, Xuanhua Shi, Hai Jin, David Lo
cs.AI
摘要
儘管程式語言模型(CLMs)在程式碼生成與摘要等軟體工程任務中展現了卓越的性能,近期的實證研究揭示了一個關鍵的隱私漏洞:這些模型無意中記住了敏感的訓練數據,使得在特定提示下能夠逐字重現機密資訊。為解決此問題,已提出多種方法,包括訓練數據去重與差分隱私增強。然而,這些方法要求對已部署的CLMs進行全模型重新訓練,這將帶來巨大的計算成本。本文旨在回答以下研究問題:能否有效且高效地抹除CLMs所記住的敏感資訊?
我們率先探討了透過機器遺忘(machine unlearning)來抹除CLMs中敏感記憶的方法——這是一種事後修改技術,能在無需全模型重新訓練的情況下,從已訓練模型中移除特定資訊。具體而言,我們首先量化了CLM訓練數據集中敏感數據的記憶風險,並精選出50,000個高風險的敏感記憶樣本作為遺忘目標。我們研究了兩種廣泛使用的基於梯度上升的遺忘方法:基礎方法與約束基方法,並引入了CodeEraser,這是一種先進的變體,能夠選擇性地遺忘程式碼中的敏感記憶片段,同時保持周圍程式碼的結構完整性與功能正確性。在CodeParrot、CodeGen-Mono與Qwen2.5-Coder這三個CLM家族上的大量實驗,驗證了CodeEraser在抹除目標敏感記憶的同時保持模型效用的有效性與效率。
English
While Code Language Models (CLMs) have demonstrated superior performance in
software engineering tasks such as code generation and summarization, recent
empirical studies reveal a critical privacy vulnerability: these models exhibit
unintended memorization of sensitive training data, enabling verbatim
reproduction of confidential information when specifically prompted. To address
this issue, several approaches, including training data de-duplication and
differential privacy augmentation, have been proposed. However, these methods
require full-model retraining for deployed CLMs, which incurs substantial
computational costs. In this paper, we aim to answer the following research
question: Can sensitive information memorized by CLMs be erased effectively and
efficiently?
We conduct a pioneering investigation into erasing sensitive memorization in
CLMs through machine unlearning - a post-hoc modification method that removes
specific information from trained models without requiring full retraining.
Specifically, we first quantify the memorization risks of sensitive data within
CLM training datasets and curate a high-risk dataset of 50,000 sensitive
memorized samples as unlearning targets. We study two widely used gradient
ascent-based unlearning approaches: the vanilla and constraint-based methods,
and introduce CodeEraser, an advanced variant that selectively unlearns
sensitive memorized segments in code while preserving the structural integrity
and functional correctness of the surrounding code. Extensive experiments on
three families of CLMs, i.e., CodeParrot, CodeGen-Mono, and Qwen2.5-Coder,
validate the effectiveness and efficiency of CodeEraser in erasing targeted
sensitive memorization while maintaining model utility.