PrivacyLens：評估語言模型在行動中對隱私規範意識的研究

摘要

由於語言模型（LM）被廣泛應用於個性化溝通場景（例如發送電子郵件、撰寫社交媒體帖子）並賦予一定程度的代理權，確保它們遵循上下文隱私規範變得日益重要。然而，由於隱私敏感案例的上下文和長尾特性，以及缺乏捕捉現實應用場景的評估方法，量化LM的隱私規範意識和LM介入溝通中新興隱私風險具有挑戰性。為應對這些挑戰，我們提出了PrivacyLens，一個新穎的框架，旨在將隱私敏感種子擴展為表達豐富的短篇故事，進而擴展為代理軌跡，實現對LM代理行為中隱私洩露的多級評估。我們在PrivacyLens中具體化了一組基於隱私文獻和眾包種子的隱私規範。利用這個數據集，我們揭示了LM在回答深入問題和在代理設置中執行用戶指令時的實際行為之間的差異。像GPT-4和Llama-3-70B這樣的最先進LM，在25.68%和38.69%的情況下會洩露敏感信息，即使在提示使用隱私增強指令時也是如此。我們還通過將每個種子擴展為多個軌跡來展示PrivacyLens的動態性，以紅隊方式測試LM的隱私洩露風險。數據集和代碼可在https://github.com/SALT-NLP/PrivacyLens 上找到。

English

As language models (LMs) are widely utilized in personalized communication scenarios (e.g., sending emails, writing social media posts) and endowed with a certain level of agency, ensuring they act in accordance with the contextual privacy norms becomes increasingly critical. However, quantifying the privacy norm awareness of LMs and the emerging privacy risk in LM-mediated communication is challenging due to (1) the contextual and long-tailed nature of privacy-sensitive cases, and (2) the lack of evaluation approaches that capture realistic application scenarios. To address these challenges, we propose PrivacyLens, a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories, enabling multi-level evaluation of privacy leakage in LM agents' actions. We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds. Using this dataset, we reveal a discrepancy between LM performance in answering probing questions and their actual behavior when executing user instructions in an agent setup. State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions. We also demonstrate the dynamic nature of PrivacyLens by extending each seed into multiple trajectories to red-team LM privacy leakage risk. Dataset and code are available at https://github.com/SALT-NLP/PrivacyLens.

PrivacyLens：評估語言模型在行動中對隱私規範意識的研究

PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action

摘要

Support