多语言大模型水印技术是否真正具备多语言能力?一种简单的回译解决方案
Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution
October 20, 2025
作者: Asim Mohamed, Martin Gubri
cs.AI
摘要
多语言水印技术旨在使大型语言模型(LLM)的输出在不同语言间可追踪,然而现有方法仍显不足。尽管声称具备跨语言鲁棒性,这些方法仅在高资源语言上进行了评估。我们揭示,现有的多语言水印方法并非真正多语言:在中低资源语言下,面对翻译攻击时其鲁棒性丧失。我们将此失败归因于语义聚类,当分词器词汇表中针对某一语言的全词标记过少时,该机制失效。为解决这一问题,我们引入了STEAM,一种基于反向翻译的检测方法,能够恢复因翻译而削弱的水印强度。STEAM与任何水印方法兼容,在不同分词器和语言间均表现出鲁棒性,非侵入式设计,且易于扩展至新语言。在17种语言上,STEAM平均提升了+0.19的AUC值和+40%的TPR@1%,为跨语言公平水印提供了一条简单而稳健的路径。
English
Multilingual watermarking aims to make large language model (LLM) outputs
traceable across languages, yet current methods still fall short. Despite
claims of cross-lingual robustness, they are evaluated only on high-resource
languages. We show that existing multilingual watermarking methods are not
truly multilingual: they fail to remain robust under translation attacks in
medium- and low-resource languages. We trace this failure to semantic
clustering, which fails when the tokenizer vocabulary contains too few
full-word tokens for a given language. To address this, we introduce STEAM, a
back-translation-based detection method that restores watermark strength lost
through translation. STEAM is compatible with any watermarking method, robust
across different tokenizers and languages, non-invasive, and easily extendable
to new languages. With average gains of +0.19 AUC and +40%p TPR@1% on 17
languages, STEAM provides a simple and robust path toward fairer watermarking
across diverse languages.