大規模言語モデルのための透かしを統合する3つの基盤

要旨

生成テキストと自然なテキストを識別するタスクは、ますます困難になっています。この文脈において、ウォーターマーキングは、生成テキストを特定のモデルに帰属させるための有望な技術として登場しています。これは、生成プロセスを変更して、生成された出力に目に見えない痕跡を残し、後の検出を容易にします。本研究は、大規模言語モデル（LLM）のためのウォーターマーキングを、理論的および実証的な観点から3つの考察に基づいて統合します。まず、低い偽陽性率（10^{-6}未満）でも有効な堅牢な理論的保証を提供する新しい統計的テストを導入します。次に、自然言語処理分野の古典的なベンチマークを使用してウォーターマーキングの有効性を比較し、その実世界での適用可能性について洞察を得ます。最後に、LLMへのアクセスが可能なシナリオやマルチビットウォーターマーキングのための高度な検出スキームを開発します。

English

The task of discerning between generated and natural texts is increasingly challenging. In this context, watermarking emerges as a promising technique for ascribing generated text to a specific model. It alters the sampling generation process so as to leave an invisible trace in the generated output, facilitating later detection. This research consolidates watermarks for large language models based on three theoretical and empirical considerations. First, we introduce new statistical tests that offer robust theoretical guarantees which remain valid even at low false-positive rates (less than 10^{-6}). Second, we compare the effectiveness of watermarks using classical benchmarks in the field of natural language processing, gaining insights into their real-world applicability. Third, we develop advanced detection schemes for scenarios where access to the LLM is available, as well as multi-bit watermarking.

大規模言語モデルのための透かしを統合する3つの基盤

Three Bricks to Consolidate Watermarks for Large Language Models

要旨

Support