视觉语言模型在位置披露中是否遵循情境完整性?
Do Vision-Language Models Respect Contextual Integrity in Location Disclosure?
February 4, 2026
作者: Ruixin Yang, Ethan Mendes, Arthur Wang, James Hays, Sauvik Das, Wei Xu, Alan Ritter
cs.AI
摘要
視覺語言模型在圖像地理位置識別方面展現出卓越性能,而前沿多模態大推理模型更強化了此能力。這種技術進步帶來了嚴重的隱私風險——廣泛可用的模型可能被濫用於從隨手分享的照片中推斷敏感位置,其精確度常可達街景級別,甚至可能超出分享者同意或意圖公開的細節層級。儘管近期研究提出通過全面限制地理位置披露來應對此風險,但這類措施未能區分正當的地理位置應用與惡意行為。相反,視覺語言模型應通過解析圖像內要素來維持情境完整性,從而決定適當的信息披露級別,實現隱私保護與實用價值的平衡。為評估模型遵循情境完整性的能力,我們提出VLM-GEOPRIVACY基準測試,要求視覺語言模型解讀真實圖像中的潛在社會規範與情境線索,並判定合適的位置信息披露程度。我們對14個主流視覺語言模型的評估表明:儘管模型能精確定位圖像位置,但其表現與人類隱私期待嚴重不符。這些模型常在敏感情境下過度披露信息,且易受基於提示詞的攻擊。研究結果表明,多模態系統需要融入情境化隱私推理的新設計原則。
English
Vision-language models (VLMs) have demonstrated strong performance in image geolocation, a capability further sharpened by frontier multimodal large reasoning models (MLRMs). This poses a significant privacy risk, as these widely accessible models can be exploited to infer sensitive locations from casually shared photos, often at street-level precision, potentially surpassing the level of detail the sharer consented or intended to disclose. While recent work has proposed applying a blanket restriction on geolocation disclosure to combat this risk, these measures fail to distinguish valid geolocation uses from malicious behavior. Instead, VLMs should maintain contextual integrity by reasoning about elements within an image to determine the appropriate level of information disclosure, balancing privacy and utility. To evaluate how well models respect contextual integrity, we introduce VLM-GEOPRIVACY, a benchmark that challenges VLMs to interpret latent social norms and contextual cues in real-world images and determine the appropriate level of location disclosure. Our evaluation of 14 leading VLMs shows that, despite their ability to precisely geolocate images, the models are poorly aligned with human privacy expectations. They often over-disclose in sensitive contexts and are vulnerable to prompt-based attacks. Our results call for new design principles in multimodal systems to incorporate context-conditioned privacy reasoning.