大型語言模型的角色扮演評估

摘要

大型語言模型（LLMs）展現出顯著的角色扮演能力，能夠採納不同人物設定並進行互動。然而，評估此能力面臨重大挑戰，因為人工評估耗費大量資源，而自動化評估則可能帶有偏見。為解決這一問題，我們引入了角色扮演評估（RPEval），這是一個新穎的基準，旨在從四個關鍵維度評估LLM的角色扮演能力：情感理解、決策制定、道德對齊及角色一致性。本文詳細介紹了RPEval的構建過程，並提供了基準評估結果。我們的程式碼與資料集可在https://github.com/yelboudouri/RPEval 獲取。

English

Large Language Models (LLMs) demonstrate a notable capacity for adopting personas and engaging in role-playing. However, evaluating this ability presents significant challenges, as human assessments are resource-intensive and automated evaluations can be biased. To address this, we introduce Role-Playing Eval (RPEval), a novel benchmark designed to assess LLM role-playing capabilities across four key dimensions: emotional understanding, decision-making, moral alignment, and in-character consistency. This article details the construction of RPEval and presents baseline evaluations. Our code and dataset are available at https://github.com/yelboudouri/RPEval

大型語言模型的角色扮演評估

Role-Playing Evaluation for Large Language Models

摘要

Support