针对大规模数据集与(中等规模)大型语言模型的强成员推断攻击
Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models
May 24, 2025
作者: Jamie Hayes, Ilia Shumailov, Christopher A. Choquette-Choo, Matthew Jagielski, George Kaissis, Katherine Lee, Milad Nasr, Sahra Ghalebikesabi, Niloofar Mireshghallah, Meenatchi Sundaram Mutu Selva Annamalai, Igor Shilov, Matthieu Meeus, Yves-Alexandre de Montjoye, Franziska Boenisch, Adam Dziedzic, A. Feder Cooper
cs.AI
摘要
現今最先進的成員推斷攻擊(MIAs)通常需要訓練多個參考模型,這使得將這些攻擊擴展到大型預訓練語言模型(LLMs)變得困難。因此,先前的研究要么依賴於避免訓練參考模型的較弱攻擊(例如,微調攻擊),要么依賴於應用於小規模模型和數據集的較強攻擊。然而,較弱的攻擊已被證明是脆弱的——其成功率接近隨機——而在簡化設置中強攻擊的洞察並不能轉移到當今的LLMs上。這些挑戰引發了一個重要問題:先前工作中觀察到的限制是由於攻擊設計選擇,還是MIAs在LLMs上根本無效?我們通過將LiRA——最強的MIAs之一——擴展到參數範圍從10M到1B的GPT-2架構,並在C4數據集上訓練超過20B個token的參考模型,來解決這個問題。我們的結果在三個關鍵方面推進了對LLMs上MIAs的理解:(1)強MIAs可以在預訓練的LLMs上成功;(2)然而,在實際設置中,其有效性仍然有限(例如,AUC<0.7);(3)MIA成功與相關隱私指標之間的關係並不像先前工作所暗示的那樣直接。
English
State-of-the-art membership inference attacks (MIAs) typically require
training many reference models, making it difficult to scale these attacks to
large pre-trained language models (LLMs). As a result, prior research has
either relied on weaker attacks that avoid training reference models (e.g.,
fine-tuning attacks), or on stronger attacks applied to small-scale models and
datasets. However, weaker attacks have been shown to be brittle - achieving
close-to-arbitrary success - and insights from strong attacks in simplified
settings do not translate to today's LLMs. These challenges have prompted an
important question: are the limitations observed in prior work due to attack
design choices, or are MIAs fundamentally ineffective on LLMs? We address this
question by scaling LiRA - one of the strongest MIAs - to GPT-2 architectures
ranging from 10M to 1B parameters, training reference models on over 20B tokens
from the C4 dataset. Our results advance the understanding of MIAs on LLMs in
three key ways: (1) strong MIAs can succeed on pre-trained LLMs; (2) their
effectiveness, however, remains limited (e.g., AUC<0.7) in practical settings;
and, (3) the relationship between MIA success and related privacy metrics is
not as straightforward as prior work has suggested.Summary
AI-Generated Summary