ChatPaper.aiChatPaper

立场:隐私保护远不止于记忆防范!

Position: Privacy Is Not Just Memorization!

October 2, 2025
作者: Niloofar Mireshghallah, Tianshi Li
cs.AI

摘要

关于大型语言模型(LLMs)隐私风险的讨论,过度集中于训练数据的逐字记忆问题,而一系列更为紧迫且可扩展的隐私威胁却未得到充分探索。本立场文件主张,LLM系统的隐私图景远不止于训练数据提取,它涵盖了数据收集实践、推理时上下文泄露、自主代理能力,以及通过深度推理攻击实现监控民主化所带来的风险。我们提出了一套贯穿LLM生命周期——从数据收集到部署——的隐私风险全面分类体系,并通过案例研究展示了当前隐私框架如何未能应对这些多方面的威胁。通过对过去十年(2016-2025)在顶级会议上发表的1,322篇AI/ML隐私论文进行纵向分析,我们发现,尽管记忆问题在技术研究中获得了不成比例的关注,但最紧迫的隐私危害却存在于其他领域,当前的技术手段在此几乎无计可施,可行的前进路径仍不明朗。我们呼吁研究界从根本上转变对LLM隐私的应对方式,超越现有技术解决方案的狭隘视野,采纳跨学科方法,以应对这些新兴威胁的社会技术本质。
English
The discourse on privacy risks in Large Language Models (LLMs) has disproportionately focused on verbatim memorization of training data, while a constellation of more immediate and scalable privacy threats remain underexplored. This position paper argues that the privacy landscape of LLM systems extends far beyond training data extraction, encompassing risks from data collection practices, inference-time context leakage, autonomous agent capabilities, and the democratization of surveillance through deep inference attacks. We present a comprehensive taxonomy of privacy risks across the LLM lifecycle -- from data collection through deployment -- and demonstrate through case studies how current privacy frameworks fail to address these multifaceted threats. Through a longitudinal analysis of 1,322 AI/ML privacy papers published at leading conferences over the past decade (2016--2025), we reveal that while memorization receives outsized attention in technical research, the most pressing privacy harms lie elsewhere, where current technical approaches offer little traction and viable paths forward remain unclear. We call for a fundamental shift in how the research community approaches LLM privacy, moving beyond the narrow focus of current technical solutions and embracing interdisciplinary approaches that address the sociotechnical nature of these emerging threats.
PDF12October 7, 2025