ChatPaper.aiChatPaper

多語言大型語言模型的水印技術是否真正多語言?一個簡單的反向翻譯解決方案

Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution

October 20, 2025
作者: Asim Mohamed, Martin Gubri
cs.AI

摘要

多語言水印技術旨在使大型語言模型(LLM)的輸出在不同語言間可追溯,然而現有方法仍存在不足。儘管聲稱具有跨語言魯棒性,這些方法僅在高資源語言上進行了評估。我們發現,現有的多語言水印方法並非真正多語言:它們在中低資源語言下的翻譯攻擊中無法保持魯棒性。我們將這一失敗歸因於語義聚類,當分詞器詞彙表中對某一語言的完整詞彙過少時,該方法便會失效。為解決此問題,我們引入了STEAM,這是一種基於反向翻譯的檢測方法,能夠恢復因翻譯而損失的水印強度。STEAM與任何水印方法兼容,在不同分詞器和語言間均表現魯棒,非侵入性強,且易於擴展至新語言。在17種語言上,STEAM平均提升了+0.19的AUC值和+40%的TPR@1%,為實現更公平的多語言水印提供了一條簡單而穩健的路徑。
English
Multilingual watermarking aims to make large language model (LLM) outputs traceable across languages, yet current methods still fall short. Despite claims of cross-lingual robustness, they are evaluated only on high-resource languages. We show that existing multilingual watermarking methods are not truly multilingual: they fail to remain robust under translation attacks in medium- and low-resource languages. We trace this failure to semantic clustering, which fails when the tokenizer vocabulary contains too few full-word tokens for a given language. To address this, we introduce STEAM, a back-translation-based detection method that restores watermark strength lost through translation. STEAM is compatible with any watermarking method, robust across different tokenizers and languages, non-invasive, and easily extendable to new languages. With average gains of +0.19 AUC and +40%p TPR@1% on 17 languages, STEAM provides a simple and robust path toward fairer watermarking across diverse languages.
PDF162October 22, 2025