PromptBench：用於評估大型語言模型的統一庫

摘要

評估大型語言模型（LLMs）對於評估其性能並減輕潛在安全風險至關重要。在本文中，我們介紹了PromptBench，這是一個用於評估LLMs的統一庫。它包括幾個關鍵組件，研究人員可以輕鬆使用和擴展：提示構建、提示工程、數據集和模型加載、對抗性提示攻擊、動態評估協議以及分析工具。PromptBench旨在成為一個開放、通用和靈活的代碼庫，用於研究目的，可以促進原創研究，創建新的基準、部署下游應用程序和設計新的評估協議。代碼可在以下鏈接找到：https://github.com/microsoft/promptbench，並將持續得到支持。

English

The evaluation of large language models (LLMs) is crucial to assess their performance and mitigate potential security risks. In this paper, we introduce PromptBench, a unified library to evaluate LLMs. It consists of several key components that are easily used and extended by researchers: prompt construction, prompt engineering, dataset and model loading, adversarial prompt attack, dynamic evaluation protocols, and analysis tools. PromptBench is designed to be an open, general, and flexible codebase for research purposes that can facilitate original study in creating new benchmarks, deploying downstream applications, and designing new evaluation protocols. The code is available at: https://github.com/microsoft/promptbench and will be continuously supported.

PromptBench：用於評估大型語言模型的統一庫

PromptBench: A Unified Library for Evaluation of Large Language Models

摘要

Support