ChatPaper.aiChatPaper

大型語言模型的角色扮演評估

Role-Playing Evaluation for Large Language Models

May 19, 2025
作者: Yassine El Boudouri, Walter Nuninger, Julian Alvarez, Yvan Peter
cs.AI

摘要

大型語言模型(LLMs)展現出顯著的角色扮演能力,能夠採納不同人物設定並進行互動。然而,評估此能力面臨重大挑戰,因為人工評估耗費大量資源,而自動化評估則可能帶有偏見。為解決這一問題,我們引入了角色扮演評估(RPEval),這是一個新穎的基準,旨在從四個關鍵維度評估LLM的角色扮演能力:情感理解、決策制定、道德對齊及角色一致性。本文詳細介紹了RPEval的構建過程,並提供了基準評估結果。我們的程式碼與資料集可在https://github.com/yelboudouri/RPEval 獲取。
English
Large Language Models (LLMs) demonstrate a notable capacity for adopting personas and engaging in role-playing. However, evaluating this ability presents significant challenges, as human assessments are resource-intensive and automated evaluations can be biased. To address this, we introduce Role-Playing Eval (RPEval), a novel benchmark designed to assess LLM role-playing capabilities across four key dimensions: emotional understanding, decision-making, moral alignment, and in-character consistency. This article details the construction of RPEval and presents baseline evaluations. Our code and dataset are available at https://github.com/yelboudouri/RPEval
PDF72June 2, 2025