大規模データセットおよび（中規模）大規模言語モデルに対する強力なメンバーシップ推論攻撃

要旨

最先端のメンバーシップ推論攻撃（MIA）は、通常、多数の参照モデルの学習を必要とするため、大規模な事前学習済み言語モデル（LLM）に対してこれらの攻撃をスケールアップすることが困難です。その結果、これまでの研究では、参照モデルの学習を回避する弱い攻撃（例：ファインチューニング攻撃）に依存するか、あるいは小規模なモデルとデータセットに適用される強い攻撃に依存してきました。しかし、弱い攻撃は脆弱であり、ほぼ任意の成功率を達成することが示されており、簡素化された設定での強い攻撃からの洞察は、今日のLLMには適用されません。これらの課題は、重要な疑問を引き起こしています：これまでの研究で観察された制限は、攻撃設計の選択によるものなのか、それともMIAがLLMに対して根本的に効果的でないためなのか？この疑問に対処するため、我々は最も強力なMIAの一つであるLiRAを、10Mから1BパラメータまでのGPT-2アーキテクチャにスケールアップし、C4データセットの20B以上のトークンで参照モデルを学習しました。我々の結果は、LLMに対するMIAの理解を以下の3つの重要な点で進展させます：(1) 強いMIAは事前学習済みLLMで成功することが可能である、(2) しかし、その有効性は実用的な設定では限定的である（例：AUC<0.7）、(3) MIAの成功と関連するプライバシーメトリクスの関係は、これまでの研究が示唆しているほど単純ではない。

English

State-of-the-art membership inference attacks (MIAs) typically require training many reference models, making it difficult to scale these attacks to large pre-trained language models (LLMs). As a result, prior research has either relied on weaker attacks that avoid training reference models (e.g., fine-tuning attacks), or on stronger attacks applied to small-scale models and datasets. However, weaker attacks have been shown to be brittle - achieving close-to-arbitrary success - and insights from strong attacks in simplified settings do not translate to today's LLMs. These challenges have prompted an important question: are the limitations observed in prior work due to attack design choices, or are MIAs fundamentally ineffective on LLMs? We address this question by scaling LiRA - one of the strongest MIAs - to GPT-2 architectures ranging from 10M to 1B parameters, training reference models on over 20B tokens from the C4 dataset. Our results advance the understanding of MIAs on LLMs in three key ways: (1) strong MIAs can succeed on pre-trained LLMs; (2) their effectiveness, however, remains limited (e.g., AUC<0.7) in practical settings; and, (3) the relationship between MIA success and related privacy metrics is not as straightforward as prior work has suggested.

大規模データセットおよび（中規模）大規模言語モデルに対する強力なメンバーシップ推論攻撃

Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models

要旨

Support