ChatPaper.aiChatPaper

PASA:一種針對語意不變攻擊下LLM生成文本的有原則的嵌入空間浮水印方法

PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks

May 9, 2026
作者: Zhenxin Ai, Haiyun He
cs.AI

摘要

針對大型語言模型(LLM)的水印技術,是一項極具前景的方法,可用於偵測LLM生成的文本,並促進負責任的部署。然而,現有的水印方法經常容易受到語義不變攻擊(例如改寫)的影響。我們提出PASA,一種原理完善、穩健且無失真的水印演算法,能夠在語義層級嵌入和偵測水印。PASA在潛在嵌入空間中的語義聚類上進行操作,並透過共享隨機性(由秘密金鑰和語義歷史同步)來建立標記序列與輔助序列之間的分佈依賴關係。此設計基於我們所提出的理論框架,該框架刻畫了聯合最優的嵌入-偵測配對,實現了偵測準確度、穩健性和失真之間的基本權衡。在多個LLM和語義不變攻擊上的評估顯示,PASA即使在強改寫攻擊下仍能保持穩健,同時維持高文本品質,優於標準的詞彙空間基線。消融研究進一步驗證了我們超參數選擇的有效性。網頁:https://ai-kunkun.github.io/PASA_page/。
English
Watermarking for large language models (LLMs) is a promising approach for detecting LLM-generated text and enabling responsible deployment. However, existing watermarking methods are often vulnerable to semantic-invariant attacks, such as paraphrasing. We propose PASA, a principled, robust, and distortion-free watermarking algorithm that embeds and detects a watermark at the semantic level. PASA operates on semantic clusters in a latent embedding space and constructs a distributional dependency between token and auxiliary sequences via shared randomness synchronized by a secret key and semantic history. This design is grounded in our theoretical framework that characterizes a jointly optimal embedding-detection pair, achieving the fundamental trade-offs among detection accuracy, robustness, and distortion. Evaluations across multiple LLMs and semantic-invariant attacks demonstrate that PASA remains robust even under strong paraphrasing attacks while preserving high text quality, outperforming standard vocabulary-space baselines. Ablation studies further validate the effectiveness of our hyperparameter choices. Webpage: https://ai-kunkun.github.io/PASA_page/.
PDF92May 14, 2026