LLM에서의 암기 현상: 메커니즘, 측정 및 완화 전략

초록

대형 언어 모델(LLMs)은 다양한 작업에서 놀라운 능력을 보여주지만, 동시에 학습 데이터를 암기하는 현상도 나타낸다. 이러한 현상은 모델의 행동, 프라이버시 위험, 그리고 학습과 암기 사이의 경계에 대한 중요한 질문을 제기한다. 이러한 문제를 다루기 위해, 본 논문은 최근 연구를 종합하고 암기 현상의 전반적인 상황, 이를 영향을 미치는 요인, 그리고 이를 탐지하고 완화하는 방법을 조사한다. 우리는 학습 데이터의 중복, 학습 동역학, 미세 조정 절차 등 데이터 암기에 영향을 미치는 주요 요인들을 탐구한다. 또한, 접두사 기반 추출, 멤버십 추론, 적대적 프롬프팅과 같은 방법론들을 검토하며, 암기된 콘텐츠를 탐지하고 측정하는 데 있어 이들의 효과를 평가한다. 기술적 분석을 넘어, 우리는 암기 현상의 법적 및 윤리적 함의를 포함한 더 넓은 영향을 탐구한다. 마지막으로, 데이터 정제, 차등 프라이버시, 학습 후 망각과 같은 완화 전략을 논의하며, 유해한 암기를 최소화하면서 유용성을 유지하는 데 있어 열려 있는 과제들을 강조한다. 본 논문은 기술적, 프라이버시, 성능 차원에서 LLM 암기에 관한 현재 연구 동향을 종합적으로 개괄하며, 향후 연구를 위한 중요한 방향을 제시한다.

English

Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks, yet they also exhibit memorization of their training data. This phenomenon raises critical questions about model behavior, privacy risks, and the boundary between learning and memorization. Addressing these concerns, this paper synthesizes recent studies and investigates the landscape of memorization, the factors influencing it, and methods for its detection and mitigation. We explore key drivers, including training data duplication, training dynamics, and fine-tuning procedures that influence data memorization. In addition, we examine methodologies such as prefix-based extraction, membership inference, and adversarial prompting, assessing their effectiveness in detecting and measuring memorized content. Beyond technical analysis, we also explore the broader implications of memorization, including the legal and ethical implications. Finally, we discuss mitigation strategies, including data cleaning, differential privacy, and post-training unlearning, while highlighting open challenges in balancing the minimization of harmful memorization with utility. This paper provides a comprehensive overview of the current state of research on LLM memorization across technical, privacy, and performance dimensions, identifying critical directions for future work.

LLM에서의 암기 현상: 메커니즘, 측정 및 완화 전략

The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation

초록

Support