大型語言模型的角色扮演評估
Role-Playing Evaluation for Large Language Models
May 19, 2025
作者: Yassine El Boudouri, Walter Nuninger, Julian Alvarez, Yvan Peter
cs.AI
摘要
大型語言模型(LLMs)展現出顯著的角色扮演能力,能夠採納不同人物設定並進行互動。然而,評估此能力面臨重大挑戰,因為人工評估耗費大量資源,而自動化評估則可能帶有偏見。為解決這一問題,我們引入了角色扮演評估(RPEval),這是一個新穎的基準,旨在從四個關鍵維度評估LLM的角色扮演能力:情感理解、決策制定、道德對齊及角色一致性。本文詳細介紹了RPEval的構建過程,並提供了基準評估結果。我們的程式碼與資料集可在https://github.com/yelboudouri/RPEval 獲取。
English
Large Language Models (LLMs) demonstrate a notable capacity for adopting
personas and engaging in role-playing. However, evaluating this ability
presents significant challenges, as human assessments are resource-intensive
and automated evaluations can be biased. To address this, we introduce
Role-Playing Eval (RPEval), a novel benchmark designed to assess LLM
role-playing capabilities across four key dimensions: emotional understanding,
decision-making, moral alignment, and in-character consistency. This article
details the construction of RPEval and presents baseline evaluations. Our code
and dataset are available at https://github.com/yelboudouri/RPEval