大语言模型中的记忆景观：机制、测量与缓解策略

摘要

大型語言模型（LLMs）在多種任務中展現了卓越的能力，然而它們也表現出對訓練數據的記憶現象。這一現象引發了關於模型行為、隱私風險以及學習與記憶之間界限的關鍵問題。針對這些問題，本文綜述了近期研究，探討了記憶的現狀、影響因素及其檢測與緩解方法。我們深入探討了包括訓練數據重複、訓練動態和微調程序在內的主要驅動因素，這些因素影響了數據的記憶。此外，我們檢視了基於前綴的提取、成員推斷和對抗性提示等方法，評估了它們在檢測和測量記憶內容方面的有效性。除了技術分析，我們還探討了記憶的更廣泛影響，包括法律和倫理層面的含義。最後，我們討論了緩解策略，如數據清理、差分隱私和訓練後遺忘，同時強調了在最小化有害記憶與保持模型效用之間平衡的開放性挑戰。本文從技術、隱私和性能三個維度，全面概述了當前關於LLM記憶的研究現狀，並指出了未來工作的關鍵方向。

English

Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks, yet they also exhibit memorization of their training data. This phenomenon raises critical questions about model behavior, privacy risks, and the boundary between learning and memorization. Addressing these concerns, this paper synthesizes recent studies and investigates the landscape of memorization, the factors influencing it, and methods for its detection and mitigation. We explore key drivers, including training data duplication, training dynamics, and fine-tuning procedures that influence data memorization. In addition, we examine methodologies such as prefix-based extraction, membership inference, and adversarial prompting, assessing their effectiveness in detecting and measuring memorized content. Beyond technical analysis, we also explore the broader implications of memorization, including the legal and ethical implications. Finally, we discuss mitigation strategies, including data cleaning, differential privacy, and post-training unlearning, while highlighting open challenges in balancing the minimization of harmful memorization with utility. This paper provides a comprehensive overview of the current state of research on LLM memorization across technical, privacy, and performance dimensions, identifying critical directions for future work.

大语言模型中的记忆景观：机制、测量与缓解策略

The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation

摘要

Support