LLMベースの言い換えツールを使用した堅牢なマルチビットテキストウォーターマーク

要旨

LLMを用いた言い換えによって埋め込まれた知覚できないマルチビットテキストウォーターマークを提案します。我々は、異なる振る舞いをするように設計されたLLMの言い換えモデルのペアをファインチューニングし、その言い換えの違いがテキストの意味論に反映され、訓練されたデコーダーによって特定できるようにします。マルチビットウォーターマークを埋め込むために、文レベルで事前定義されたバイナリコードを符号化するために、2つの言い換えモデルを交互に使用します。その後、テキスト分類器をデコーダーとして使用して、ウォーターマークの各ビットをデコードします。幅広い実験により、小規模（1.1B）のテキスト言い換えモデルを使用しながら、我々のウォーターマークが元の文の意味情報を保持しつつ、99.99\%以上の検出AUCを達成できることを示します。さらに、単語の置換や文の言い換えの摂動に対して頑健であり、分布外データにも適応性が高いことを示します。また、LLMに基づく評価によって、我々のウォーターマークの潜在性を示します。コードはオープンソースで公開しています： https://github.com/xiaojunxu/multi-bit-text-watermark.

English

We propose an imperceptible multi-bit text watermark embedded by paraphrasing with LLMs. We fine-tune a pair of LLM paraphrasers that are designed to behave differently so that their paraphrasing difference reflected in the text semantics can be identified by a trained decoder. To embed our multi-bit watermark, we use two paraphrasers alternatively to encode the pre-defined binary code at the sentence level. Then we use a text classifier as the decoder to decode each bit of the watermark. Through extensive experiments, we show that our watermarks can achieve over 99.99\% detection AUC with small (1.1B) text paraphrasers while keeping the semantic information of the original sentence. More importantly, our pipeline is robust under word substitution and sentence paraphrasing perturbations and generalizes well to out-of-distributional data. We also show the stealthiness of our watermark with LLM-based evaluation. We open-source the code: https://github.com/xiaojunxu/multi-bit-text-watermark.

LLMベースの言い換えツールを使用した堅牢なマルチビットテキストウォーターマーク

Robust Multi-bit Text Watermark with LLM-based Paraphrasers

要旨

Support