대규모 언어 모델의 사전 학습 데이터 탐지

초록

대규모 언어 모델(LLM)이 널리 배포되고 있음에도 불구하고, 이를 훈련하는 데 사용된 데이터는 거의 공개되지 않습니다. 이러한 데이터의 규모가 수조 개의 토큰에 달한다는 점을 고려할 때, 저작권이 있는 자료, 개인 식별 정보, 널리 보고된 벤치마크의 테스트 데이터와 같은 잠재적으로 문제가 될 수 있는 텍스트가 포함되어 있을 가능성은 거의 확실합니다. 그러나 현재로서는 이러한 유형의 데이터가 어떤 것들이 포함되어 있는지, 그리고 그 비율이 얼마나 되는지 알 수 있는 방법이 없습니다. 본 논문에서는 사전 훈련 데이터 탐지 문제를 연구합니다: 주어진 텍스트와 사전 훈련 데이터를 알 수 없는 블랙박스 형태의 LLM에 접근할 때, 모델이 제공된 텍스트로 훈련되었는지 여부를 판단할 수 있을까요? 이 연구를 위해, 우리는 모델 훈련 전후에 생성된 데이터를 사용하여 정확한 탐지를 지원하는 동적 벤치마크 WIKIMIA를 소개합니다. 또한, 새로운 탐지 방법인 Min-K% Prob을 제안합니다. 이 방법은 간단한 가설에 기반합니다: 보지 못한 예제는 LLM 하에서 낮은 확률을 가진 몇 가지 이상 단어를 포함할 가능성이 높은 반면, 본 적 있는 예제는 그렇게 낮은 확률을 가진 단어를 포함할 가능성이 적습니다. Min-K% Prob은 사전 훈련 코퍼스에 대한 지식이나 추가적인 훈련 없이도 적용할 수 있으며, 이는 사전 훈련 데이터와 유사한 데이터에 대해 참조 모델을 훈련해야 하는 기존의 탐지 방법과 차별화됩니다. 더욱이, 우리의 실험은 Min-K% Prob이 WIKIMIA에서 기존 방법들보다 7.4%의 성능 향상을 달성함을 보여줍니다. 우리는 Min-K% Prob을 두 가지 실제 시나리오, 즉 저작권이 있는 책 탐지와 오염된 다운스트림 예제 탐지에 적용하였고, 이 방법이 일관되게 효과적인 해결책임을 발견했습니다.

English

Although large language models (LLMs) are widely deployed, the data used to train them is rarely disclosed. Given the incredible scale of this data, up to trillions of tokens, it is all but certain that it includes potentially problematic text such as copyrighted materials, personally identifiable information, and test data for widely reported reference benchmarks. However, we currently have no way to know which data of these types is included or in what proportions. In this paper, we study the pretraining data detection problem: given a piece of text and black-box access to an LLM without knowing the pretraining data, can we determine if the model was trained on the provided text? To facilitate this study, we introduce a dynamic benchmark WIKIMIA that uses data created before and after model training to support gold truth detection. We also introduce a new detection method Min-K% Prob based on a simple hypothesis: an unseen example is likely to contain a few outlier words with low probabilities under the LLM, while a seen example is less likely to have words with such low probabilities. Min-K% Prob can be applied without any knowledge about the pretraining corpus or any additional training, departing from previous detection methods that require training a reference model on data that is similar to the pretraining data. Moreover, our experiments demonstrate that Min-K% Prob achieves a 7.4% improvement on WIKIMIA over these previous methods. We apply Min-K% Prob to two real-world scenarios, copyrighted book detection, and contaminated downstream example detection, and find it a consistently effective solution.

대규모 언어 모델의 사전 학습 데이터 탐지

Detecting Pretraining Data from Large Language Models

초록

Support