ハッシュ化ウォーターマークをフィルタとして活用：重みベースのニューラルネットワークウォーターマークングにおける偽造および上書き攻撃の防御

要旨

貴重なデジタル資産として、ディープニューラルネットワークには堅牢な所有権保護が必要であり、ニューラルネットワーク透かし（NNW）が有望な解決策として位置づけられています。さまざまなNNW手法の中でも、重みベースの手法はその簡潔さと実用性から好まれていますが、偽造や上書き攻撃に対して脆弱なままです。これらの課題に対処するため、我々はハッシュ化された透かしフィルタを中心に構築された堅牢な手法であるNeuralMarkを提案します。具体的には、ハッシュ関数を使用して秘密鍵から不可逆なバイナリ透かしを生成し、それをフィルタとして使用して埋め込むモデルパラメータを選択します。この設計は、埋め込みパラメータとハッシュ化された透かしを巧妙に絡み合わせることで、偽造と上書き攻撃の両方に対する堅牢な防御を提供します。さらに、微調整やプルーニング攻撃に抵抗するために平均プーリングも組み込まれています。さらに、さまざまなニューラルネットワークアーキテクチャにシームレスに統合できるため、幅広い適用性が確保されています。理論的には、そのセキュリティ境界を分析します。実証的には、5つの画像分類タスクと1つのテキスト生成タスクをカバーする13の異なる畳み込みおよびトランスフォーマーアーキテクチャにわたってその有効性と堅牢性を検証します。ソースコードはhttps://github.com/AIResearch-Group/NeuralMarkで公開されています。

English

As valuable digital assets, deep neural networks necessitate robust ownership protection, positioning neural network watermarking (NNW) as a promising solution. Among various NNW approaches, weight-based methods are favored for their simplicity and practicality; however, they remain vulnerable to forging and overwriting attacks. To address those challenges, we propose NeuralMark, a robust method built around a hashed watermark filter. Specifically, we utilize a hash function to generate an irreversible binary watermark from a secret key, which is then used as a filter to select the model parameters for embedding. This design cleverly intertwines the embedding parameters with the hashed watermark, providing a robust defense against both forging and overwriting attacks. An average pooling is also incorporated to resist fine-tuning and pruning attacks. Furthermore, it can be seamlessly integrated into various neural network architectures, ensuring broad applicability. Theoretically, we analyze its security boundary. Empirically, we verify its effectiveness and robustness across 13 distinct Convolutional and Transformer architectures, covering five image classification tasks and one text generation task. The source codes are available at https://github.com/AIResearch-Group/NeuralMark.

ハッシュ化ウォーターマークをフィルタとして活用：重みベースのニューラルネットワークウォーターマークングにおける偽造および上書き攻撃の防御

Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking

要旨

Support