ChatPaper.aiChatPaper

视觉语言模型在位置披露中是否遵循情境完整性?

Do Vision-Language Models Respect Contextual Integrity in Location Disclosure?

February 4, 2026
作者: Ruixin Yang, Ethan Mendes, Arthur Wang, James Hays, Sauvik Das, Wei Xu, Alan Ritter
cs.AI

摘要

视觉语言模型(VLM)在图像地理位置识别方面展现出强大能力,而前沿多模态大推理模型(MLRM)进一步强化了这一能力。这引发了严重的隐私风险——这些广泛可用的模型可能被滥用于从随意分享的照片中推断敏感位置,其精确度常能达到街道级别,甚至可能超出分享者同意或意图公开的细节范围。尽管近期研究提出通过全面限制地理位置披露来应对此风险,但这些措施未能区分合理的地理位置应用与恶意行为。相反,视觉语言模型应通过分析图像内容要素来维持情境完整性,从而确定适当的信息披露层级,实现隐私保护与实用性的平衡。为评估模型遵循情境完整性的能力,我们提出VLM-GEOPRIVACY基准测试,该测试要求视觉语言模型解读真实图像中的潜在社会规范与情境线索,并确定恰当的位置信息披露程度。通过对14个主流视觉语言模型的评估发现,尽管它们能精确识别图像地理位置,但其表现与人类隐私期望严重不符:模型常在敏感情境下过度披露信息,且易受基于提示词的攻击。我们的研究结果表明,多模态系统需要引入结合情境化隐私推理的新设计原则。
English
Vision-language models (VLMs) have demonstrated strong performance in image geolocation, a capability further sharpened by frontier multimodal large reasoning models (MLRMs). This poses a significant privacy risk, as these widely accessible models can be exploited to infer sensitive locations from casually shared photos, often at street-level precision, potentially surpassing the level of detail the sharer consented or intended to disclose. While recent work has proposed applying a blanket restriction on geolocation disclosure to combat this risk, these measures fail to distinguish valid geolocation uses from malicious behavior. Instead, VLMs should maintain contextual integrity by reasoning about elements within an image to determine the appropriate level of information disclosure, balancing privacy and utility. To evaluate how well models respect contextual integrity, we introduce VLM-GEOPRIVACY, a benchmark that challenges VLMs to interpret latent social norms and contextual cues in real-world images and determine the appropriate level of location disclosure. Our evaluation of 14 leading VLMs shows that, despite their ability to precisely geolocate images, the models are poorly aligned with human privacy expectations. They often over-disclose in sensitive contexts and are vulnerable to prompt-based attacks. Our results call for new design principles in multimodal systems to incorporate context-conditioned privacy reasoning.
PDF22February 7, 2026