PromptBench: 적대적 프롬프트에 대한 대규모 언어 모델의 견고성 평가를 향하여

초록

학계와 산업계에서 대형 언어 모델(LLMs)에 대한 의존도가 점차 증가함에 따라, 이러한 모델의 프롬프트에 대한 견고성을 포괄적으로 이해하는 것이 필수적입니다. 이러한 중요한 필요성에 대응하여, 본 연구는 적대적 프롬프트에 대한 LLMs의 견고성을 측정하기 위해 설계된 견고성 벤치마크인 PromptBench를 소개합니다. 본 연구는 문자, 단어, 문장, 의미적 수준을 아우르는 다양한 적대적 텍스트 공격을 프롬프트에 적용합니다. 이러한 프롬프트는 감정 분석, 자연어 추론, 독해, 기계 번역, 수학 문제 해결 등 다양한 작업에 활용됩니다. 본 연구는 총 567,084개의 테스트 샘플을 포함하여 8개의 작업과 13개의 데이터셋에 걸쳐 4,032개의 적대적 프롬프트를 생성하고 세심하게 평가합니다. 연구 결과는 현대의 LLMs가 적대적 프롬프트에 취약하다는 것을 보여줍니다. 또한, 프롬프트 견고성과 그 전이성 뒤에 숨겨진 미스터리를 이해하기 위한 포괄적인 분석을 제시합니다. 이어서, 연구자와 일반 사용자 모두에게 유익한 프롬프트 구성에 대한 통찰력 있는 견고성 분석과 실용적인 권장 사항을 제공합니다. 본 연구는 적대적 프롬프트를 생성하기 위한 코드, 프롬프트, 방법론을 공개하여 이 중요한 분야에서의 협력적 탐구를 가능하게 하고 장려합니다: https://github.com/microsoft/promptbench.

English

The increasing reliance on Large Language Models (LLMs) across academia and industry necessitates a comprehensive understanding of their robustness to prompts. In response to this vital need, we introduce PromptBench, a robustness benchmark designed to measure LLMs' resilience to adversarial prompts. This study uses a plethora of adversarial textual attacks targeting prompts across multiple levels: character, word, sentence, and semantic. These prompts are then employed in diverse tasks, such as sentiment analysis, natural language inference, reading comprehension, machine translation, and math problem-solving. Our study generates 4,032 adversarial prompts, meticulously evaluated over 8 tasks and 13 datasets, with 567,084 test samples in total. Our findings demonstrate that contemporary LLMs are vulnerable to adversarial prompts. Furthermore, we present comprehensive analysis to understand the mystery behind prompt robustness and its transferability. We then offer insightful robustness analysis and pragmatic recommendations for prompt composition, beneficial to both researchers and everyday users. We make our code, prompts, and methodologies to generate adversarial prompts publicly accessible, thereby enabling and encouraging collaborative exploration in this pivotal field: https://github.com/microsoft/promptbench.

PromptBench: 적대적 프롬프트에 대한 대규모 언어 모델의 견고성 평가를 향하여

PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

초록

Support