三塊磚塊以鞏固大型語言模型的浮水印

摘要

在區分生成文本和自然文本之間的任務日益具有挑戰性。在這種情況下，水印技術被提出作為一種將生成文本歸因於特定模型的有前途的技術。它改變了抽樣生成過程，以在生成輸出中留下一個看不見的痕跡，從而方便後續檢測。本研究基於三個理論和實證考慮因素，鞏固了大型語言模型的水印技術。首先，我們引入了新的統計測試，提供了強大的理論保證，即使在低誤報率（小於10^{-6}）下仍然有效。其次，我們通過在自然語言處理領域中使用經典基準來比較水印的有效性，獲得了有關它們在現實應用中的見解。第三，我們為可以訪問大型語言模型的情況以及多比特水印技術開發了先進的檢測方案。

English

The task of discerning between generated and natural texts is increasingly challenging. In this context, watermarking emerges as a promising technique for ascribing generated text to a specific model. It alters the sampling generation process so as to leave an invisible trace in the generated output, facilitating later detection. This research consolidates watermarks for large language models based on three theoretical and empirical considerations. First, we introduce new statistical tests that offer robust theoretical guarantees which remain valid even at low false-positive rates (less than 10^{-6}). Second, we compare the effectiveness of watermarks using classical benchmarks in the field of natural language processing, gaining insights into their real-world applicability. Third, we develop advanced detection schemes for scenarios where access to the LLM is available, as well as multi-bit watermarking.

三塊磚塊以鞏固大型語言模型的浮水印

Three Bricks to Consolidate Watermarks for Large Language Models

摘要

Support