PrivacyLens: 언어 모델의 개인 정보 보호 규범 인식 평가

초록

언어 모델(LMs)은 맞춤형 커뮤니케이션 시나리오(예: 이메일 보내기, 소셜 미디어 게시물 작성)에서 널리 활용되며 일정 수준의 에이전시를 부여받습니다. 이에 따라, LMs가 맥락적 개인 정보 보호 규범에 따라 행동하도록 보장하는 것이 점점 중요해지고 있습니다. 그러나, LMs의 개인 정보 보호 규범 인식 및 LM 매개 커뮤니케이션에서의 신흥 개인 정보 보호 위험을 측정하는 것은 어려운데, 그 이유는 (1) 맥락적이고 꼬리가 긴 개인 정보 보호 관련 사례의 본질과 (2) 현실적인 응용 시나리오를 포착하는 평가 방법의 부재 때문입니다. 이러한 도전에 대처하기 위해, 우리는 PrivacyLens라는 혁신적인 프레임워크를 제안합니다. 이 프레임워크는 개인 정보 보호에 민감한 시드를 풍부한 단편으로 확장하고, 더 나아가 에이전트 궤적으로 확장하여 LM 에이전트의 행동에서 개인 정보 누출을 다중 수준으로 평가할 수 있도록 설계되었습니다. 우리는 PrivacyLens를 개인 정보 보호 문헌에 근거한 개인 정보 보호 규범과 크라우드소싱된 시드로 구성된 데이터셋으로 구체화합니다. 이 데이터셋을 사용하여, 우리는 LM의 성능이 살펴보는 질문에 대답하는 것과 사용자 지시를 실행할 때의 실제 행동 사이에 불일치가 있음을 밝혀냅니다. GPT-4 및 Llama-3-70B와 같은 최신 LM은 개인 정보 강화 지시를 받았을 때에도 25.68% 및 38.69%의 경우에 민감한 정보를 누설합니다. 또한, 각 시드를 여러 궤적으로 확장하여 LM의 개인 정보 누출 위험을 평가하는 레드팀을 시연합니다. 데이터셋과 코드는 https://github.com/SALT-NLP/PrivacyLens에서 제공됩니다.

English

As language models (LMs) are widely utilized in personalized communication scenarios (e.g., sending emails, writing social media posts) and endowed with a certain level of agency, ensuring they act in accordance with the contextual privacy norms becomes increasingly critical. However, quantifying the privacy norm awareness of LMs and the emerging privacy risk in LM-mediated communication is challenging due to (1) the contextual and long-tailed nature of privacy-sensitive cases, and (2) the lack of evaluation approaches that capture realistic application scenarios. To address these challenges, we propose PrivacyLens, a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories, enabling multi-level evaluation of privacy leakage in LM agents' actions. We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds. Using this dataset, we reveal a discrepancy between LM performance in answering probing questions and their actual behavior when executing user instructions in an agent setup. State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions. We also demonstrate the dynamic nature of PrivacyLens by extending each seed into multiple trajectories to red-team LM privacy leakage risk. Dataset and code are available at https://github.com/SALT-NLP/PrivacyLens.

PrivacyLens: 언어 모델의 개인 정보 보호 규범 인식 평가

PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action

초록

Support