프롬프트 튜닝을 통해 대규모 언어 모델에서 암기된 데이터 추출 제어하기

초록

대형 언어 모델(LLMs)은 학습 데이터의 상당 부분을 암기하는 것으로 알려져 있습니다. 이 암기된 콘텐츠의 일부는 단순히 모델에 질의함으로써 추출 가능한 것으로 나타나며, 이는 프라이버시 위험을 초래합니다. 본 연구에서는 프롬프트 튜닝을 활용하여 LLM에서 암기된 콘텐츠의 추출률을 제어하는 새로운 접근 방식을 제시합니다. 추출률을 증가시키고 감소시키는 두 가지 프롬프트 학습 전략을 제안하며, 이는 각각 공격과 방어에 해당합니다. GPT-Neo 계열 모델을 공개 벤치마크에 적용하여 우리 기법의 효과를 입증합니다. 1.3B 파라미터 GPT-Neo 모델의 경우, 우리의 공격 기법은 기준선 대비 추출률을 9.3% 포인트 증가시켰습니다. 우리의 방어 기법은 사용자가 지정한 하이퍼파라미터를 통해 다양한 프라이버시-유틸리티 트레이드오프를 달성할 수 있도록 조정 가능합니다. 기준선 대비 최대 97.7%의 추출률 감소를 달성했으며, 이때 perplexity는 16.9% 증가했습니다.

English

Large Language Models (LLMs) are known to memorize significant portions of their training data. Parts of this memorized content have been shown to be extractable by simply querying the model, which poses a privacy risk. We present a novel approach which uses prompt-tuning to control the extraction rates of memorized content in LLMs. We present two prompt training strategies to increase and decrease extraction rates, which correspond to an attack and a defense, respectively. We demonstrate the effectiveness of our techniques by using models from the GPT-Neo family on a public benchmark. For the 1.3B parameter GPT-Neo model, our attack yields a 9.3 percentage point increase in extraction rate compared to our baseline. Our defense can be tuned to achieve different privacy-utility trade-offs by a user-specified hyperparameter. We achieve an extraction rate reduction of up to 97.7% relative to our baseline, with a perplexity increase of 16.9%.

프롬프트 튜닝을 통해 대규모 언어 모델에서 암기된 데이터 추출 제어하기

Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning

초록

Support