个性化文本到图像生成：定制专属视觉风格

摘要

当代文本到图像生成模型虽能生成高保真度视觉内容，却始终无法适配个体用户的偏好。现有奖励模型虽以"普适性"人类审美为优化目标，却未能捕捉审美判断固有的主观性。本研究引入名为PAMELA的新型数据集与预测框架，专门用于建模个性化图像评估。该数据集包含由前沿模型生成的5,000张多元图像获得的70,000条评分，每张图像由15位独立用户评估，在艺术、设计、时尚及电影摄影等领域构建了丰富的主观偏好分布。基于此数据，我们提出一种个性化奖励模型，该模型联合训练于高质量标注数据与现有美学评估子集。实验表明，本模型预测个体偏好的准确度，甚至超越了多数现有前沿方法预测群体偏好的表现。通过该个性化预测器，我们验证了如何利用简易提示词优化方法引导生成结果贴合个体用户偏好。研究结果凸显了数据质量与个性化处理对于应对用户偏好主观性的关键作用。我们公开数据集与模型，以推动个性化文图对齐与主观视觉质量评估领域的标准化研究。

English

Modern text-to-image (T2I) models generate high-fidelity visuals but remain indifferent to individual user preferences. While existing reward models optimize for "average" human appeal, they fail to capture the inherent subjectivity of aesthetic judgment. In this work, we introduce a novel dataset and predictive framework, called PAMELA, designed to model personalized image evaluations. Our dataset comprises 70,000 ratings across 5,000 diverse images generated by state-of-the-art models (Flux 2 and Nano Banana). Each image is evaluated by 15 unique users, providing a rich distribution of subjective preferences across domains such as art, design, fashion, and cinematic photography. Leveraging this data, we propose a personalized reward model trained jointly on our high-quality annotations and existing aesthetic assessment subsets. We demonstrate that our model predicts individual liking with higher accuracy than the majority of current state-of-the-art methods predict population-level preferences. Using our personalized predictor, we demonstrate how simple prompt optimization methods can be used to steer generations towards individual user preferences. Our results highlight the importance of data quality and personalization to handle the subjectivity of user preferences. We release our dataset and model to facilitate standardized research in personalized T2I alignment and subjective visual quality assessment.

个性化文本到图像生成：定制专属视觉风格

Personalizing Text-to-Image Generation to Individual Taste

摘要

Support