大型语言模型水印的三个基石

摘要

在区分生成文本和自然文本的任务变得越来越具有挑战性。在这种情况下，数字水印技术被提出作为一种将生成文本归因于特定模型的有前途的技术。它改变了采样生成过程，以在生成的输出中留下看不见的痕迹，有助于后续检测。本研究基于三个理论和经验考虑，巩固了大型语言模型的水印技术。首先，我们引入了新的统计检验，提供了强大的理论保证，即使在低误报率（小于10^{-6}）的情况下也仍然有效。其次，我们使用自然语言处理领域的经典基准比较了水印的有效性，深入了解它们在现实世界中的适用性。第三，我们为可以访问大型语言模型的情况以及多比特水印技术开发了先进的检测方案。

English

The task of discerning between generated and natural texts is increasingly challenging. In this context, watermarking emerges as a promising technique for ascribing generated text to a specific model. It alters the sampling generation process so as to leave an invisible trace in the generated output, facilitating later detection. This research consolidates watermarks for large language models based on three theoretical and empirical considerations. First, we introduce new statistical tests that offer robust theoretical guarantees which remain valid even at low false-positive rates (less than 10^{-6}). Second, we compare the effectiveness of watermarks using classical benchmarks in the field of natural language processing, gaining insights into their real-world applicability. Third, we develop advanced detection schemes for scenarios where access to the LLM is available, as well as multi-bit watermarking.

大型语言模型水印的三个基石

Three Bricks to Consolidate Watermarks for Large Language Models

摘要

Support