ChatPaper.aiChatPaper

人物角色提示:透视大语言模型的社会推理能力

Persona Prompting as a Lens on LLM Social Reasoning

January 28, 2026
作者: Jing Yang, Moritz Hechtbauer, Elisabeth Khalilov, Evelyn Luise Brinkmann, Vera Schmitt, Nils Feldhus
cs.AI

摘要

在仇恨言论检测等社会敏感性任务中,大型语言模型(LLMs)生成解释的质量对用户信任和模型对齐等因素至关重要。虽然角色提示(PP)作为一种引导模型实现用户定制化生成的方式日益普及,但其对模型推理过程的影响仍待深入探索。我们研究了LLM生成的归因依据如何随不同模拟人口统计角色的设定而变化。通过使用带有词级归因标注的数据集,我们测量了模型与不同人口统计群体人工标注的一致性,并评估了PP对模型偏见和人类对齐的影响。针对三种LLMs的评估结果揭示三个关键发现:(1)PP在最主观的任务(仇恨言论检测)中提升分类性能,但会降低归因质量;(2)模拟角色未能与现实世界中对应人口统计群体对齐,且角色间高度一致性表明模型难以被有效引导;(3)无论是否使用PP,模型均表现出稳定的人口统计偏见和过度标记内容为有害的强烈倾向。我们的研究揭示了一个关键权衡:尽管PP能提升社会敏感性任务的分类性能,但这往往以牺牲归因质量为代价,且无法缓解模型固有偏见,因此在实际应用中需审慎使用。
English
For socially sensitive tasks like hate speech detection, the quality of explanations from Large Language Models (LLMs) is crucial for factors like user trust and model alignment. While Persona prompting (PP) is increasingly used as a way to steer model towards user-specific generation, its effect on model rationales remains underexplored. We investigate how LLM-generated rationales vary when conditioned on different simulated demographic personas. Using datasets annotated with word-level rationales, we measure agreement with human annotations from different demographic groups, and assess the impact of PP on model bias and human alignment. Our evaluation across three LLMs results reveals three key findings: (1) PP improving classification on the most subjective task (hate speech) but degrading rationale quality. (2) Simulated personas fail to align with their real-world demographic counterparts, and high inter-persona agreement shows models are resistant to significant steering. (3) Models exhibit consistent demographic biases and a strong tendency to over-flag content as harmful, regardless of PP. Our findings reveal a critical trade-off: while PP can improve classification in socially-sensitive tasks, it often comes at the cost of rationale quality and fails to mitigate underlying biases, urging caution in its application.
PDF11January 30, 2026