LoRA重みからのデータセットサイズ復元

要旨

モデル逆変換攻撃とメンバーシップ推論攻撃は、モデルが学習に使用したデータを再構築し検証することを目的としています。しかし、これらの攻撃は学習セットのサイズを知らないため、すべての学習サンプルを見つけることは保証されていません。本論文では、モデルの重みから直接、学習に使用されたサンプル数を特定することを目的とした新しいタスク、データセットサイズ復元を紹介します。そして、LoRAを使用したファインチューニングが一般的なケースにおいて、ファインチューニングに使用された画像数を復元する方法であるDSiReを提案します。LoRA行列のノルムとスペクトルがファインチューニングデータセットのサイズと密接に関連していることを発見し、この知見を活用してシンプルかつ効果的な予測アルゴリズムを提案します。LoRA重みのデータセットサイズ復元を評価するために、2000以上の多様なLoRAファインチューニングモデルから得られた25000以上の重みスナップショットからなる新しいベンチマーク、LoRA-WiSEを開発し公開します。私たちの最良の分類器は、ファインチューニング画像数を平均絶対誤差0.36画像で予測することができ、この攻撃の実現可能性を確立しました。

English

Model inversion and membership inference attacks aim to reconstruct and verify the data which a model was trained on. However, they are not guaranteed to find all training samples as they do not know the size of the training set. In this paper, we introduce a new task: dataset size recovery, that aims to determine the number of samples used to train a model, directly from its weights. We then propose DSiRe, a method for recovering the number of images used to fine-tune a model, in the common case where fine-tuning uses LoRA. We discover that both the norm and the spectrum of the LoRA matrices are closely linked to the fine-tuning dataset size; we leverage this finding to propose a simple yet effective prediction algorithm. To evaluate dataset size recovery of LoRA weights, we develop and release a new benchmark, LoRA-WiSE, consisting of over 25000 weight snapshots from more than 2000 diverse LoRA fine-tuned models. Our best classifier can predict the number of fine-tuning images with a mean absolute error of 0.36 images, establishing the feasibility of this attack.

LoRA重みからのデータセットサイズ復元

Dataset Size Recovery from LoRA Weights

要旨

Support