PromptBench:用于评估大型语言模型的统一库
PromptBench: A Unified Library for Evaluation of Large Language Models
December 13, 2023
作者: Kaijie Zhu, Qinlin Zhao, Hao Chen, Jindong Wang, Xing Xie
cs.AI
摘要
对大型语言模型(LLMs)进行评估对于评估其性能并减轻潜在安全风险至关重要。在本文中,我们介绍了PromptBench,这是一个用于评估LLMs的统一库。它包括几个关键组件,研究人员可以轻松使用和扩展:提示构建、提示工程、数据集和模型加载、对抗性提示攻击、动态评估协议和分析工具。PromptBench旨在成为一个开放、通用和灵活的代码库,用于研究目的,可以促进原创研究,创建新的基准测试,部署下游应用程序和设计新的评估协议。该代码可在以下网址获得:https://github.com/microsoft/promptbench,并将持续得到支持。
English
The evaluation of large language models (LLMs) is crucial to assess their
performance and mitigate potential security risks. In this paper, we introduce
PromptBench, a unified library to evaluate LLMs. It consists of several key
components that are easily used and extended by researchers: prompt
construction, prompt engineering, dataset and model loading, adversarial prompt
attack, dynamic evaluation protocols, and analysis tools. PromptBench is
designed to be an open, general, and flexible codebase for research purposes
that can facilitate original study in creating new benchmarks, deploying
downstream applications, and designing new evaluation protocols. The code is
available at: https://github.com/microsoft/promptbench and will be continuously
supported.