大型语言模型的角色扮演评估
Role-Playing Evaluation for Large Language Models
May 19, 2025
作者: Yassine El Boudouri, Walter Nuninger, Julian Alvarez, Yvan Peter
cs.AI
摘要
大型语言模型(LLMs)展现出了显著的角色扮演能力,能够灵活地采纳不同人物设定并投入其中。然而,评估这一能力面临重大挑战,因为人工评估既耗费资源,而自动化评估又可能存在偏差。为解决这一问题,我们引入了角色扮演评估(RPEval),这是一个新颖的基准测试,旨在从四个关键维度评估LLM的角色扮演能力:情感理解、决策制定、道德一致性及角色内一致性。本文详细阐述了RPEval的构建过程,并提供了基线评估结果。我们的代码与数据集已发布于https://github.com/yelboudouri/RPEval。
English
Large Language Models (LLMs) demonstrate a notable capacity for adopting
personas and engaging in role-playing. However, evaluating this ability
presents significant challenges, as human assessments are resource-intensive
and automated evaluations can be biased. To address this, we introduce
Role-Playing Eval (RPEval), a novel benchmark designed to assess LLM
role-playing capabilities across four key dimensions: emotional understanding,
decision-making, moral alignment, and in-character consistency. This article
details the construction of RPEval and presents baseline evaluations. Our code
and dataset are available at https://github.com/yelboudouri/RPEvalSummary
AI-Generated Summary