대규모 언어 모델을 위한 워터마크 통합을 위한 세 가지 기반 요소

초록

생성된 텍스트와 자연 텍스트를 구별하는 작업은 점점 더 어려워지고 있다. 이러한 맥락에서 워터마킹은 생성된 텍스트를 특정 모델에 귀속시키기 위한 유망한 기술로 부상하고 있다. 이는 샘플링 생성 과정을 변경하여 생성된 출력에 보이지 않는 흔적을 남김으로써 나중에 탐지가 용이하도록 한다. 본 연구는 세 가지 이론적 및 실증적 고려 사항을 바탕으로 대규모 언어 모델(LLM)을 위한 워터마크를 통합한다. 첫째, 낮은 오탐률(10^{-6} 미만)에서도 유효한 강력한 이론적 보장을 제공하는 새로운 통계적 테스트를 소개한다. 둘째, 자연어 처리 분야의 고전적인 벤치마크를 사용하여 워터마크의 효과를 비교함으로써 실제 적용 가능성에 대한 통찰을 얻는다. 셋째, LLM에 접근이 가능한 시나리오와 다중 비트 워터마킹을 위한 고급 탐지 기법을 개발한다.

English

The task of discerning between generated and natural texts is increasingly challenging. In this context, watermarking emerges as a promising technique for ascribing generated text to a specific model. It alters the sampling generation process so as to leave an invisible trace in the generated output, facilitating later detection. This research consolidates watermarks for large language models based on three theoretical and empirical considerations. First, we introduce new statistical tests that offer robust theoretical guarantees which remain valid even at low false-positive rates (less than 10^{-6}). Second, we compare the effectiveness of watermarks using classical benchmarks in the field of natural language processing, gaining insights into their real-world applicability. Third, we develop advanced detection schemes for scenarios where access to the LLM is available, as well as multi-bit watermarking.

대규모 언어 모델을 위한 워터마크 통합을 위한 세 가지 기반 요소

Three Bricks to Consolidate Watermarks for Large Language Models

초록

Support