合成自我！诱导角色引导提示以构建大语言模型中的个性化奖励机制

摘要

近期关于大语言模型（LLMs）多元化对齐的呼声，鼓励模型适应多样化的用户偏好。然而，以往关于个性化奖励模型的研究大多依赖于额外的身份信息，如人口统计细节或预设的偏好类别。为此，我们提出了SynthesizeMe方法，通过用户交互诱导合成用户角色，用于个性化奖励建模。SynthesizeMe首先生成并验证解释用户偏好的推理，随后基于该推理诱导出合成用户角色，最后筛选出信息丰富的先验用户交互，为特定用户构建个性化提示。我们证明，使用SynthesizeMe诱导的提示在Chatbot Arena上提升了4.4%的个性化LLM作为评判者的准确性。将SynthesizeMe衍生的提示与奖励模型结合，在PersonalRewardBench上取得了最佳表现：这是一个新策划的用户分层与聊天机器人交互数据集，收集自Chatbot Arena和PRISM的854名用户。

English

Recent calls for pluralistic alignment of Large Language Models (LLMs) encourage adapting models to diverse user preferences. However, most prior work on personalized reward models heavily rely on additional identity information, such as demographic details or a predefined set of preference categories. To this end, we introduce SynthesizeMe, an approach to inducing synthetic user personas from user interactions for personalized reward modeling. SynthesizeMe first generates and verifies reasoning to explain user preferences, then induces synthetic user personas from that reasoning, and finally filters to informative prior user interactions in order to build personalized prompts for a particular user. We show that using SynthesizeMe induced prompts improves personalized LLM-as-a-judge accuracy by 4.4% on Chatbot Arena. Combining SynthesizeMe derived prompts with a reward model achieves top performance on PersonalRewardBench: a new curation of user-stratified interactions with chatbots collected from 854 users of Chatbot Arena and PRISM.

合成自我！诱导角色引导提示以构建大语言模型中的个性化奖励机制

SynthesizeMe! Inducing Persona-Guided Prompts for Personalized Reward Models in LLMs

摘要

Support