SynthesizeMe! 개인화된 보상 모델을 위한 페르소나 기반 프롬프트 유도

초록

최근 대형 언어 모델(LLMs)의 다원적 정렬(pluralistic alignment)에 대한 요구가 증가하면서, 모델을 다양한 사용자 선호도에 맞게 조정하는 것이 중요해졌습니다. 그러나 기존의 개인화된 보상 모델 연구 대부분은 인구통계학적 세부 정보나 사전 정의된 선호도 카테고리와 같은 추가적인 신원 정보에 크게 의존해 왔습니다. 이를 위해 우리는 사용자 상호작용에서 합성 사용자 페르소나를 유도하여 개인화된 보상 모델링을 수행하는 SynthesizeMe 접근법을 소개합니다. SynthesizeMe는 먼저 사용자 선호도를 설명하기 위한 추론을 생성하고 검증한 후, 해당 추론에서 합성 사용자 페르소나를 유도합니다. 마지막으로 특정 사용자를 위한 개인화된 프롬프트를 구축하기 위해 정보가 풍부한 이전 사용자 상호작용을 필터링합니다. 우리는 SynthesizeMe로 유도된 프롬프트를 사용함으로써 Chatbot Arena에서 개인화된 LLM-as-a-judge 정확도가 4.4% 향상됨을 보여줍니다. 또한 SynthesizeMe에서 도출된 프롬프트와 보상 모델을 결합하면, Chatbot Arena와 PRISM의 854명 사용자로부터 수집된 챗봇과의 사용자 계층화 상호작용을 새롭게 구성한 PersonalRewardBench에서 최고 성능을 달성합니다.

English

Recent calls for pluralistic alignment of Large Language Models (LLMs) encourage adapting models to diverse user preferences. However, most prior work on personalized reward models heavily rely on additional identity information, such as demographic details or a predefined set of preference categories. To this end, we introduce SynthesizeMe, an approach to inducing synthetic user personas from user interactions for personalized reward modeling. SynthesizeMe first generates and verifies reasoning to explain user preferences, then induces synthetic user personas from that reasoning, and finally filters to informative prior user interactions in order to build personalized prompts for a particular user. We show that using SynthesizeMe induced prompts improves personalized LLM-as-a-judge accuracy by 4.4% on Chatbot Arena. Combining SynthesizeMe derived prompts with a reward model achieves top performance on PersonalRewardBench: a new curation of user-stratified interactions with chatbots collected from 854 users of Chatbot Arena and PRISM.

SynthesizeMe! 개인화된 보상 모델을 위한 페르소나 기반 프롬프트 유도

SynthesizeMe! Inducing Persona-Guided Prompts for Personalized Reward Models in LLMs

초록

Support