立場:隱私不僅僅是記憶問題!
Position: Privacy Is Not Just Memorization!
October 2, 2025
作者: Niloofar Mireshghallah, Tianshi Li
cs.AI
摘要
關於大型語言模型(LLMs)隱私風險的討論,過度集中於訓練數據的逐字記憶,而一系列更為直接且可擴展的隱私威脅卻未得到充分探討。本立場文件主張,LLM系統的隱私格局遠超訓練數據提取,涵蓋了數據收集實踐、推理時上下文洩露、自主代理能力,以及通過深度推理攻擊實現的監控民主化等風險。我們提出了一個全面的隱私風險分類體系,涵蓋LLM生命週期——從數據收集到部署——並通過案例研究展示當前隱私框架如何未能應對這些多方面的威脅。通過對過去十年(2016-2025)在頂級會議上發表的1,322篇AI/ML隱私論文的縱向分析,我們揭示出,儘管記憶問題在技術研究中受到過度關注,但最迫切的隱私危害卻存在於其他領域,當前技術方法在這些領域幾乎無能為力,可行的前進道路仍不明朗。我們呼籲研究界在處理LLM隱私問題時進行根本性轉變,超越當前技術解決方案的狹隘視角,採納跨學科方法,以應對這些新興威脅的社會技術本質。
English
The discourse on privacy risks in Large Language Models (LLMs) has
disproportionately focused on verbatim memorization of training data, while a
constellation of more immediate and scalable privacy threats remain
underexplored. This position paper argues that the privacy landscape of LLM
systems extends far beyond training data extraction, encompassing risks from
data collection practices, inference-time context leakage, autonomous agent
capabilities, and the democratization of surveillance through deep inference
attacks. We present a comprehensive taxonomy of privacy risks across the LLM
lifecycle -- from data collection through deployment -- and demonstrate through
case studies how current privacy frameworks fail to address these multifaceted
threats. Through a longitudinal analysis of 1,322 AI/ML privacy papers
published at leading conferences over the past decade (2016--2025), we reveal
that while memorization receives outsized attention in technical research, the
most pressing privacy harms lie elsewhere, where current technical approaches
offer little traction and viable paths forward remain unclear. We call for a
fundamental shift in how the research community approaches LLM privacy, moving
beyond the narrow focus of current technical solutions and embracing
interdisciplinary approaches that address the sociotechnical nature of these
emerging threats.