大型语言模型的角色扮演评估

摘要

大型语言模型（LLMs）展现出了显著的角色扮演能力，能够灵活地采纳不同人物设定并投入其中。然而，评估这一能力面临重大挑战，因为人工评估既耗费资源，而自动化评估又可能存在偏差。为解决这一问题，我们引入了角色扮演评估（RPEval），这是一个新颖的基准测试，旨在从四个关键维度评估LLM的角色扮演能力：情感理解、决策制定、道德一致性及角色内一致性。本文详细阐述了RPEval的构建过程，并提供了基线评估结果。我们的代码与数据集已发布于https://github.com/yelboudouri/RPEval。

English

Large Language Models (LLMs) demonstrate a notable capacity for adopting personas and engaging in role-playing. However, evaluating this ability presents significant challenges, as human assessments are resource-intensive and automated evaluations can be biased. To address this, we introduce Role-Playing Eval (RPEval), a novel benchmark designed to assess LLM role-playing capabilities across four key dimensions: emotional understanding, decision-making, moral alignment, and in-character consistency. This article details the construction of RPEval and presents baseline evaluations. Our code and dataset are available at https://github.com/yelboudouri/RPEval

大型语言模型的角色扮演评估

Role-Playing Evaluation for Large Language Models

摘要

Support