PromptBench:針對大型語言模型在對抗性提示上的韌性進行評估
PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
June 7, 2023
作者: Kaijie Zhu, Jindong Wang, Jiaheng Zhou, Zichen Wang, Hao Chen, Yidong Wang, Linyi Yang, Wei Ye, Neil Zhenqiang Gong, Yue Zhang, Xing Xie
cs.AI
摘要
隨著學術界和工業界對大型語言模型(LLMs)的日益依賴,迫切需要全面了解它們對提示的韌性。為了滿足這一重要需求,我們介紹了PromptBench,這是一個旨在衡量LLMs對對抗性提示的韌性的基準測試。本研究使用了大量針對提示的對抗性文本攻擊,涵蓋多個層次:字符、詞彙、句子和語義。這些提示隨後應用於各種任務,如情感分析、自然語言推理、閱讀理解、機器翻譯和數學問題解決。我們的研究生成了4,032個對抗性提示,經過細緻評估,涵蓋8個任務和13個數據集,總共有567,084個測試樣本。我們的研究結果顯示,當前的LLMs容易受到對抗性提示的影響。此外,我們提供了全面的分析,以了解提示韌性及其可轉移性背後的奧秘。然後,我們提供了具有洞察力的韌性分析和實用的提示組成建議,對研究人員和普通用戶都有益。我們將我們的代碼、提示和生成對抗性提示的方法論公開,從而促進這一重要領域的協作探索:https://github.com/microsoft/promptbench。
English
The increasing reliance on Large Language Models (LLMs) across academia and
industry necessitates a comprehensive understanding of their robustness to
prompts. In response to this vital need, we introduce PromptBench, a robustness
benchmark designed to measure LLMs' resilience to adversarial prompts. This
study uses a plethora of adversarial textual attacks targeting prompts across
multiple levels: character, word, sentence, and semantic. These prompts are
then employed in diverse tasks, such as sentiment analysis, natural language
inference, reading comprehension, machine translation, and math
problem-solving. Our study generates 4,032 adversarial prompts, meticulously
evaluated over 8 tasks and 13 datasets, with 567,084 test samples in total. Our
findings demonstrate that contemporary LLMs are vulnerable to adversarial
prompts. Furthermore, we present comprehensive analysis to understand the
mystery behind prompt robustness and its transferability. We then offer
insightful robustness analysis and pragmatic recommendations for prompt
composition, beneficial to both researchers and everyday users. We make our
code, prompts, and methodologies to generate adversarial prompts publicly
accessible, thereby enabling and encouraging collaborative exploration in this
pivotal field: https://github.com/microsoft/promptbench.