EmbTracker:面向联邦语言模型的可溯源黑盒水印技术
EmbTracker: Traceable Black-box Watermarking for Federated Language Models
March 12, 2026
作者: Haodong Zhao, Jinming Hu, Yijie Bai, Tian Dong, Wei Du, Zhuosheng Zhang, Yanjiao Chen, Haojin Zhu, Gongshen Liu
cs.AI
摘要
联邦语言模型(FedLM)实现了无需共享原始数据的协同学习,但其引入了关键安全漏洞——任何不可信客户端均可能泄露接收到的功能模型实例。现有FedLM水印方案通常需白盒访问和客户端协同,仅能提供群组级所有权证明而缺乏个体追溯能力。本文提出EmbTracker,一种专为FedLM设计的服务端可追溯黑盒水印框架。该框架通过植入可经简单API查询检测的后门水印实现黑盒验证,并通过向分发给各客户端的模型注入独特身份水印实现客户端级追溯,从而准确定位泄露模型的特定责任方,即使面对非合作参与者也能确保鲁棒性。在多种语言模型和视觉语言模型上的大规模实验表明,EmbTracker能实现接近100%的验证率,对微调、剪枝、量化等去除攻击具有高抵抗力,且对主任务性能影响可忽略(通常控制在1-2%以内)。
English
Federated Language Model (FedLM) allows a collaborative learning without sharing raw data, yet it introduces a critical vulnerability, as every untrustworthy client may leak the received functional model instance. Current watermarking schemes for FedLM often require white-box access and client-side cooperation, providing only group-level proof of ownership rather than individual traceability. We propose EmbTracker, a server-side, traceable black-box watermarking framework specifically designed for FedLMs. EmbTracker achieves black-box verifiability by embedding a backdoor-based watermark detectable through simple API queries. Client-level traceability is realized by injecting unique identity-specific watermarks into the model distributed to each client. In this way, a leaked model can be attributed to a specific culprit, ensuring robustness even against non-cooperative participants. Extensive experiments on various language and vision-language models demonstrate that EmbTracker achieves robust traceability with verification rates near 100\%, high resilience against removal attacks (fine-tuning, pruning, quantization), and negligible impact on primary task performance (typically within 1-2\%).