從大型語言模型中檢測預訓練數據

摘要

儘管大型語言模型（LLMs）被廣泛應用，但用於訓練它們的數據很少被披露。考慮到這些數據的龐大規模，高達數萬億標記，幾乎可以肯定其中包含潛在問題文本，例如受版權保護的材料、可識別個人信息以及廣泛報導的參考基準測試數據。然而，我們目前無法確定包含了哪些類型的數據，以及其比例如何。本文研究了預訓練數據檢測問題：在不知道預訓練數據的情況下，給定一段文本和對LLM的黑盒訪問，我們能否確定模型是否是在提供的文本上進行訓練的？為了促進這一研究，我們引入了一個動態基準WIKIMIA，該基準使用在模型訓練之前和之後創建的數據來支持金標真實檢測。我們還提出了一種新的檢測方法Min-K% Prob，基於一個簡單的假設：在LLM下，一個未見過的示例很可能包含一些概率較低的異常詞，而一個已見過的示例較不可能包含這種概率較低的詞。Min-K% Prob可以應用，而無需了解預訓練語料庫或進行任何額外的訓練，這與以往需要在與預訓練數據相似的數據上訓練參考模型的檢測方法有所不同。此外，我們的實驗表明，Min-K% Prob在WIKIMIA上比這些以前的方法實現了7.4%的改進。我們將Min-K% Prob應用於兩個現實場景，即受版權保護書籍檢測和受污染的下游示例檢測，並發現它是一個一致有效的解決方案。

English

Although large language models (LLMs) are widely deployed, the data used to train them is rarely disclosed. Given the incredible scale of this data, up to trillions of tokens, it is all but certain that it includes potentially problematic text such as copyrighted materials, personally identifiable information, and test data for widely reported reference benchmarks. However, we currently have no way to know which data of these types is included or in what proportions. In this paper, we study the pretraining data detection problem: given a piece of text and black-box access to an LLM without knowing the pretraining data, can we determine if the model was trained on the provided text? To facilitate this study, we introduce a dynamic benchmark WIKIMIA that uses data created before and after model training to support gold truth detection. We also introduce a new detection method Min-K% Prob based on a simple hypothesis: an unseen example is likely to contain a few outlier words with low probabilities under the LLM, while a seen example is less likely to have words with such low probabilities. Min-K% Prob can be applied without any knowledge about the pretraining corpus or any additional training, departing from previous detection methods that require training a reference model on data that is similar to the pretraining data. Moreover, our experiments demonstrate that Min-K% Prob achieves a 7.4% improvement on WIKIMIA over these previous methods. We apply Min-K% Prob to two real-world scenarios, copyrighted book detection, and contaminated downstream example detection, and find it a consistently effective solution.

從大型語言模型中檢測預訓練數據

Detecting Pretraining Data from Large Language Models

摘要

Support