민감한 기억을 지워라! 머신 언러닝을 통해 코드 언어 모델에서의 민감한 암기 현상 제거

초록

코드 언어 모델(CLM)은 코드 생성 및 요약과 같은 소프트웨어 엔지니어링 작업에서 우수한 성능을 보여왔지만, 최근의 실증 연구는 중요한 프라이버시 취약점을 드러냈습니다: 이러한 모델들은 학습 데이터 중 민감한 정보를 의도치 않게 암기하여, 특정 프롬프트가 주어졌을 때 기밀 정보를 그대로 재현할 수 있습니다. 이 문제를 해결하기 위해 학습 데이터 중복 제거 및 차등 프라이버시 강화와 같은 여러 접근 방식이 제안되었습니다. 그러나 이러한 방법들은 배포된 CLM에 대해 전체 모델 재학습을 요구하며, 이는 상당한 계산 비용을 초래합니다. 본 논문에서는 다음과 같은 연구 질문에 답하고자 합니다: CLM이 암기한 민감한 정보를 효과적이고 효율적으로 삭제할 수 있는가? 우리는 머신 언러닝(machine unlearning)을 통해 CLM 내의 민감한 암기 정보를 삭제하는 선구적인 연구를 수행합니다. 머신 언러닝은 학습된 모델에서 특정 정보를 제거하기 위해 전체 재학습 없이 사후 수정을 적용하는 방법입니다. 구체적으로, 우리는 먼저 CLM 학습 데이터셋 내의 민감한 데이터에 대한 암기 위험을 정량화하고, 50,000개의 고위험 암기 샘플로 구성된 데이터셋을 언러닝 대상으로 선별합니다. 우리는 널리 사용되는 두 가지 경사 상승 기반 언러닝 접근 방식(기본 방법과 제약 기반 방법)을 연구하고, 주변 코드의 구조적 무결성과 기능적 정확성을 유지하면서 코드 내의 민감한 암기 세그먼트를 선택적으로 언러닝하는 고급 변형인 CodeEraser를 소개합니다. CodeParrot, CodeGen-Mono, Qwen2.5-Coder 등 세 가지 CLM 계열에 대한 광범위한 실험을 통해, CodeEraser가 목표로 하는 민감한 암기 정보를 효과적이고 효율적으로 삭제하면서도 모델의 유용성을 유지하는 것을 검증합니다.

English

While Code Language Models (CLMs) have demonstrated superior performance in software engineering tasks such as code generation and summarization, recent empirical studies reveal a critical privacy vulnerability: these models exhibit unintended memorization of sensitive training data, enabling verbatim reproduction of confidential information when specifically prompted. To address this issue, several approaches, including training data de-duplication and differential privacy augmentation, have been proposed. However, these methods require full-model retraining for deployed CLMs, which incurs substantial computational costs. In this paper, we aim to answer the following research question: Can sensitive information memorized by CLMs be erased effectively and efficiently? We conduct a pioneering investigation into erasing sensitive memorization in CLMs through machine unlearning - a post-hoc modification method that removes specific information from trained models without requiring full retraining. Specifically, we first quantify the memorization risks of sensitive data within CLM training datasets and curate a high-risk dataset of 50,000 sensitive memorized samples as unlearning targets. We study two widely used gradient ascent-based unlearning approaches: the vanilla and constraint-based methods, and introduce CodeEraser, an advanced variant that selectively unlearns sensitive memorized segments in code while preserving the structural integrity and functional correctness of the surrounding code. Extensive experiments on three families of CLMs, i.e., CodeParrot, CodeGen-Mono, and Qwen2.5-Coder, validate the effectiveness and efficiency of CodeEraser in erasing targeted sensitive memorization while maintaining model utility.

민감한 기억을 지워라! 머신 언러닝을 통해 코드 언어 모델에서의 민감한 암기 현상 제거

Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning

초록

Support