哈希水印作为过滤器:抵御权重神经网络水印中的伪造与覆盖攻击
Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking
July 15, 2025
作者: Yuan Yao, Jin Song, Jian Jin
cs.AI
摘要
作为宝贵的数字资产,深度神经网络亟需强有力的所有权保护,这使得神经网络水印技术(NNW)成为一种极具前景的解决方案。在众多NNW方法中,基于权重的方法因其简便性和实用性而备受青睐;然而,它们仍易遭受伪造和覆盖攻击。为应对这些挑战,我们提出了NeuralMark,一种围绕哈希水印滤波器构建的鲁棒方法。具体而言,我们利用哈希函数从密钥生成不可逆的二进制水印,随后将其作为滤波器选择模型参数进行嵌入。这一设计巧妙地将嵌入参数与哈希水印交织在一起,为抵御伪造和覆盖攻击提供了坚固防线。此外,我们还引入了平均池化以抵抗微调和剪枝攻击。更为重要的是,该方法能够无缝集成到多种神经网络架构中,确保了广泛的适用性。从理论上,我们分析了其安全边界;在实证层面,我们验证了其在13种不同的卷积和Transformer架构上的有效性与鲁棒性,涵盖了五项图像分类任务和一项文本生成任务。源代码已发布于https://github.com/AIResearch-Group/NeuralMark。
English
As valuable digital assets, deep neural networks necessitate robust ownership
protection, positioning neural network watermarking (NNW) as a promising
solution. Among various NNW approaches, weight-based methods are favored for
their simplicity and practicality; however, they remain vulnerable to forging
and overwriting attacks. To address those challenges, we propose NeuralMark, a
robust method built around a hashed watermark filter. Specifically, we utilize
a hash function to generate an irreversible binary watermark from a secret key,
which is then used as a filter to select the model parameters for embedding.
This design cleverly intertwines the embedding parameters with the hashed
watermark, providing a robust defense against both forging and overwriting
attacks. An average pooling is also incorporated to resist fine-tuning and
pruning attacks. Furthermore, it can be seamlessly integrated into various
neural network architectures, ensuring broad applicability. Theoretically, we
analyze its security boundary. Empirically, we verify its effectiveness and
robustness across 13 distinct Convolutional and Transformer architectures,
covering five image classification tasks and one text generation task. The
source codes are available at https://github.com/AIResearch-Group/NeuralMark.