ChatPaper.aiChatPaper

人物:一个可复现的多元对齐测试平台

PERSONA: A Reproducible Testbed for Pluralistic Alignment

July 24, 2024
作者: Louis Castricato, Nathan Lile, Rafael Rafailov, Jan-Philipp Fränken, Chelsea Finn
cs.AI

摘要

语言模型(LMs)的快速发展需要与各种用户价值观进行强健对齐。然而,当前的偏好优化方法通常无法捕捉用户意见的多样性,而是强化多数观点并边缘化少数派观点。我们引入了 PERSONA,这是一个可复制的测试平台,旨在评估和改进 LM 的多元对齐。我们从美国人口普查数据中程序生成多样化的用户资料,生成了1,586个具有不同人口统计和特殊属性的合成人物。然后,我们生成了一个大规模评估数据集,包含3,868个提示和从我们的合成人物中获得的317,200个反馈对。利用这个数据集,我们系统评估 LM 在扮演多样化用户角色方面的能力,通过人类评委验证,并建立了一个用于多元对齐方法的基准 PERSONA Bench,以及一个用于创建新的未来基准的广泛数据集。完整数据集和基准可在以下网址找到:https://www.synthlabs.ai/research/persona。
English
The rapid advancement of language models (LMs) necessitates robust alignment with diverse user values. However, current preference optimization approaches often fail to capture the plurality of user opinions, instead reinforcing majority viewpoints and marginalizing minority perspectives. We introduce PERSONA, a reproducible test bed designed to evaluate and improve pluralistic alignment of LMs. We procedurally generate diverse user profiles from US census data, resulting in 1,586 synthetic personas with varied demographic and idiosyncratic attributes. We then generate a large-scale evaluation dataset containing 3,868 prompts and 317,200 feedback pairs obtained from our synthetic personas. Leveraging this dataset, we systematically evaluate LM capabilities in role-playing diverse users, verified through human judges, and the establishment of both a benchmark, PERSONA Bench, for pluralistic alignment approaches as well as an extensive dataset to create new and future benchmarks. The full dataset and benchmarks are available here: https://www.synthlabs.ai/research/persona.

Summary

AI-Generated Summary

PDF202November 28, 2024