ChatPaper.aiChatPaper

针对海量数据集与(中等规模)大型语言模型的强成员推断攻击

Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models

May 24, 2025
作者: Jamie Hayes, Ilia Shumailov, Christopher A. Choquette-Choo, Matthew Jagielski, George Kaissis, Katherine Lee, Milad Nasr, Sahra Ghalebikesabi, Niloofar Mireshghallah, Meenatchi Sundaram Mutu Selva Annamalai, Igor Shilov, Matthieu Meeus, Yves-Alexandre de Montjoye, Franziska Boenisch, Adam Dziedzic, A. Feder Cooper
cs.AI

摘要

当前最先进的成员推断攻击(MIAs)通常需要训练大量参考模型,这使得将这些攻击扩展到大型预训练语言模型(LLMs)变得困难。因此,先前的研究要么依赖于避免训练参考模型的较弱攻击(例如,微调攻击),要么在小型模型和数据集上应用更强的攻击。然而,较弱的攻击已被证明是脆弱的——其成功率接近随机——而在简化设置中获得的强攻击的洞察力并不能直接应用于当今的LLMs。这些挑战引发了一个重要问题:先前工作中观察到的限制是由于攻击设计选择,还是MIAs在LLMs上本质上就无效?我们通过将LiRA——最强的MIAs之一——扩展到参数规模从10M到1B的GPT-2架构,并在C4数据集上训练超过200亿个token的参考模型,来解答这一问题。我们的研究结果在三个方面深化了对LLMs上MIAs的理解:(1)强MIAs可以在预训练的LLMs上取得成功;(2)然而,在实际场景中,其有效性仍然有限(例如,AUC<0.7);(3)MIA成功与相关隐私指标之间的关系并不像先前研究所示的那样直接。
English
State-of-the-art membership inference attacks (MIAs) typically require training many reference models, making it difficult to scale these attacks to large pre-trained language models (LLMs). As a result, prior research has either relied on weaker attacks that avoid training reference models (e.g., fine-tuning attacks), or on stronger attacks applied to small-scale models and datasets. However, weaker attacks have been shown to be brittle - achieving close-to-arbitrary success - and insights from strong attacks in simplified settings do not translate to today's LLMs. These challenges have prompted an important question: are the limitations observed in prior work due to attack design choices, or are MIAs fundamentally ineffective on LLMs? We address this question by scaling LiRA - one of the strongest MIAs - to GPT-2 architectures ranging from 10M to 1B parameters, training reference models on over 20B tokens from the C4 dataset. Our results advance the understanding of MIAs on LLMs in three key ways: (1) strong MIAs can succeed on pre-trained LLMs; (2) their effectiveness, however, remains limited (e.g., AUC<0.7) in practical settings; and, (3) the relationship between MIA success and related privacy metrics is not as straightforward as prior work has suggested.

Summary

AI-Generated Summary

PDF72May 27, 2025