SynthesizeMe!:誘導角色導向提示以實現大型語言模型中的個性化獎勵模型
SynthesizeMe! Inducing Persona-Guided Prompts for Personalized Reward Models in LLMs
June 5, 2025
作者: Michael J Ryan, Omar Shaikh, Aditri Bhagirath, Daniel Frees, William Held, Diyi Yang
cs.AI
摘要
近期關於大型語言模型(LLMs)多元對齊的呼籲,鼓勵模型適應多樣化的用戶偏好。然而,大多數先前關於個性化獎勵模型的研究,都嚴重依賴額外的身份信息,如人口統計細節或預定義的偏好類別。為此,我們提出了SynthesizeMe方法,該方法從用戶互動中誘導出合成用戶角色,用於個性化獎勵建模。SynthesizeMe首先生成並驗證解釋用戶偏好的推理,然後從該推理中誘導出合成用戶角色,最後篩選出信息豐富的先前用戶互動,以構建針對特定用戶的個性化提示。我們展示了使用SynthesizeMe誘導的提示,在Chatbot Arena上將個性化LLM作為評判者的準確率提高了4.4%。將SynthesizeMe衍生的提示與獎勵模型結合,在PersonalRewardBench上達到了頂尖性能:這是一個新的用戶分層互動策展,收集自Chatbot Arena和PRISM的854名用戶與聊天機器人的互動。
English
Recent calls for pluralistic alignment of Large Language Models (LLMs)
encourage adapting models to diverse user preferences. However, most prior work
on personalized reward models heavily rely on additional identity information,
such as demographic details or a predefined set of preference categories. To
this end, we introduce SynthesizeMe, an approach to inducing synthetic user
personas from user interactions for personalized reward modeling. SynthesizeMe
first generates and verifies reasoning to explain user preferences, then
induces synthetic user personas from that reasoning, and finally filters to
informative prior user interactions in order to build personalized prompts for
a particular user. We show that using SynthesizeMe induced prompts improves
personalized LLM-as-a-judge accuracy by 4.4% on Chatbot Arena. Combining
SynthesizeMe derived prompts with a reward model achieves top performance on
PersonalRewardBench: a new curation of user-stratified interactions with
chatbots collected from 854 users of Chatbot Arena and PRISM.