PromptBench: 大規模言語モデル評価のための統合ライブラリ

要旨

大規模言語モデル（LLMs）の評価は、その性能を測定し、潜在的なセキュリティリスクを軽減するために重要です。本論文では、LLMsを評価するための統一ライブラリであるPromptBenchを紹介します。PromptBenchは、研究者が容易に使用および拡張できるいくつかの主要コンポーネントで構成されています：プロンプト構築、プロンプトエンジニアリング、データセットとモデルのロード、敵対的プロンプト攻撃、動的評価プロトコル、および分析ツールです。PromptBenchは、新しいベンチマークの作成、ダウンストリームアプリケーションの展開、新しい評価プロトコルの設計といったオリジナル研究を促進するための、オープンで汎用的かつ柔軟なコードベースとして設計されています。コードはhttps://github.com/microsoft/promptbenchで公開されており、継続的にサポートされます。

English

The evaluation of large language models (LLMs) is crucial to assess their performance and mitigate potential security risks. In this paper, we introduce PromptBench, a unified library to evaluate LLMs. It consists of several key components that are easily used and extended by researchers: prompt construction, prompt engineering, dataset and model loading, adversarial prompt attack, dynamic evaluation protocols, and analysis tools. PromptBench is designed to be an open, general, and flexible codebase for research purposes that can facilitate original study in creating new benchmarks, deploying downstream applications, and designing new evaluation protocols. The code is available at: https://github.com/microsoft/promptbench and will be continuously supported.

PromptBench: 大規模言語モデル評価のための統合ライブラリ

PromptBench: A Unified Library for Evaluation of Large Language Models

要旨

Support