大規模言語モデルからの事前学習データの検出

要旨

大規模言語モデル（LLM）は広く展開されているものの、その訓練に使用されたデータはほとんど公開されていない。このデータの規模は数兆トークンに及ぶため、著作権で保護された素材、個人を特定可能な情報、広く報告されている参照ベンチマークのテストデータなど、潜在的に問題のあるテキストが含まれていることはほぼ確実である。しかし、現時点では、これらの種類のデータがどの程度含まれているかを知る手段はない。本論文では、事前学習データ検出問題を研究する：与えられたテキストと、事前学習データを知らないブラックボックスアクセス可能なLLMを前提として、そのモデルが提供されたテキストで訓練されたかどうかを判断できるか？この研究を促進するため、モデル訓練の前後に作成されたデータを使用して真実検出をサポートする動的ベンチマークWIKIMIAを導入する。また、新しい検出手法Min-K% Probを提案する。これは、未見の例にはLLMの下で確率が低い外れ値の単語がいくつか含まれる可能性が高く、既見の例にはそのような低確率の単語が含まれる可能性が低いという単純な仮説に基づいている。Min-K% Probは、事前学習コーパスに関する知識や追加の訓練を必要とせずに適用可能であり、事前学習データに類似したデータで参照モデルを訓練する必要がある従来の検出手法とは異なる。さらに、我々の実験では、Min-K% ProbがWIKIMIAにおいて従来の手法よりも7.4%の改善を達成することを示している。Min-K% Probを、著作権で保護された書籍の検出と、汚染された下流例の検出という2つの現実世界のシナリオに適用し、一貫して効果的な解決策であることを確認した。

English

Although large language models (LLMs) are widely deployed, the data used to train them is rarely disclosed. Given the incredible scale of this data, up to trillions of tokens, it is all but certain that it includes potentially problematic text such as copyrighted materials, personally identifiable information, and test data for widely reported reference benchmarks. However, we currently have no way to know which data of these types is included or in what proportions. In this paper, we study the pretraining data detection problem: given a piece of text and black-box access to an LLM without knowing the pretraining data, can we determine if the model was trained on the provided text? To facilitate this study, we introduce a dynamic benchmark WIKIMIA that uses data created before and after model training to support gold truth detection. We also introduce a new detection method Min-K% Prob based on a simple hypothesis: an unseen example is likely to contain a few outlier words with low probabilities under the LLM, while a seen example is less likely to have words with such low probabilities. Min-K% Prob can be applied without any knowledge about the pretraining corpus or any additional training, departing from previous detection methods that require training a reference model on data that is similar to the pretraining data. Moreover, our experiments demonstrate that Min-K% Prob achieves a 7.4% improvement on WIKIMIA over these previous methods. We apply Min-K% Prob to two real-world scenarios, copyrighted book detection, and contaminated downstream example detection, and find it a consistently effective solution.

大規模言語モデルからの事前学習データの検出

Detecting Pretraining Data from Large Language Models

要旨

Support