PromptBench：針對大型語言模型在對抗性提示上的韌性進行評估

摘要

隨著學術界和工業界對大型語言模型（LLMs）的日益依賴，迫切需要全面了解它們對提示的韌性。為了滿足這一重要需求，我們介紹了PromptBench，這是一個旨在衡量LLMs對對抗性提示的韌性的基準測試。本研究使用了大量針對提示的對抗性文本攻擊，涵蓋多個層次：字符、詞彙、句子和語義。這些提示隨後應用於各種任務，如情感分析、自然語言推理、閱讀理解、機器翻譯和數學問題解決。我們的研究生成了4,032個對抗性提示，經過細緻評估，涵蓋8個任務和13個數據集，總共有567,084個測試樣本。我們的研究結果顯示，當前的LLMs容易受到對抗性提示的影響。此外，我們提供了全面的分析，以了解提示韌性及其可轉移性背後的奧秘。然後，我們提供了具有洞察力的韌性分析和實用的提示組成建議，對研究人員和普通用戶都有益。我們將我們的代碼、提示和生成對抗性提示的方法論公開，從而促進這一重要領域的協作探索：https://github.com/microsoft/promptbench。

English

The increasing reliance on Large Language Models (LLMs) across academia and industry necessitates a comprehensive understanding of their robustness to prompts. In response to this vital need, we introduce PromptBench, a robustness benchmark designed to measure LLMs' resilience to adversarial prompts. This study uses a plethora of adversarial textual attacks targeting prompts across multiple levels: character, word, sentence, and semantic. These prompts are then employed in diverse tasks, such as sentiment analysis, natural language inference, reading comprehension, machine translation, and math problem-solving. Our study generates 4,032 adversarial prompts, meticulously evaluated over 8 tasks and 13 datasets, with 567,084 test samples in total. Our findings demonstrate that contemporary LLMs are vulnerable to adversarial prompts. Furthermore, we present comprehensive analysis to understand the mystery behind prompt robustness and its transferability. We then offer insightful robustness analysis and pragmatic recommendations for prompt composition, beneficial to both researchers and everyday users. We make our code, prompts, and methodologies to generate adversarial prompts publicly accessible, thereby enabling and encouraging collaborative exploration in this pivotal field: https://github.com/microsoft/promptbench.

PromptBench：針對大型語言模型在對抗性提示上的韌性進行評估

PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

摘要

Support