PromptBench：评估大型语言模型在对抗性提示上的鲁棒性

摘要

随着学术界和工业界对大型语言模型（LLMs）的日益依赖，迫使我们全面了解它们对提示的鲁棒性。为了满足这一重要需求，我们引入了PromptBench，一个旨在衡量LLMs对对抗性提示的抗性的鲁棒性基准。本研究使用了大量针对不同级别的提示的对抗性文本攻击：字符、单词、句子和语义。这些提示随后被应用于各种任务，如情感分析、自然语言推理、阅读理解、机器翻译和数学问题解决。我们的研究生成了4,032个对抗性提示，经过细致评估，涵盖了8个任务和13个数据集，总共有567,084个测试样本。我们的发现表明，当代LLMs对对抗性提示是脆弱的。此外，我们提供了全面的分析，以了解提示鲁棒性及其可转移性背后的奥秘。然后，我们提供了深入的鲁棒性分析和实用的提示构成建议，对研究人员和普通用户都有益。我们将我们的代码、提示和生成对抗性提示的方法公开，以便促进和鼓励在这一关键领域的协作探索：https://github.com/microsoft/promptbench。

English

The increasing reliance on Large Language Models (LLMs) across academia and industry necessitates a comprehensive understanding of their robustness to prompts. In response to this vital need, we introduce PromptBench, a robustness benchmark designed to measure LLMs' resilience to adversarial prompts. This study uses a plethora of adversarial textual attacks targeting prompts across multiple levels: character, word, sentence, and semantic. These prompts are then employed in diverse tasks, such as sentiment analysis, natural language inference, reading comprehension, machine translation, and math problem-solving. Our study generates 4,032 adversarial prompts, meticulously evaluated over 8 tasks and 13 datasets, with 567,084 test samples in total. Our findings demonstrate that contemporary LLMs are vulnerable to adversarial prompts. Furthermore, we present comprehensive analysis to understand the mystery behind prompt robustness and its transferability. We then offer insightful robustness analysis and pragmatic recommendations for prompt composition, beneficial to both researchers and everyday users. We make our code, prompts, and methodologies to generate adversarial prompts publicly accessible, thereby enabling and encouraging collaborative exploration in this pivotal field: https://github.com/microsoft/promptbench.

PromptBench：评估大型语言模型在对抗性提示上的鲁棒性

PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

摘要

Support