개인의 취향에 맞는 텍스트-이미지 생성 맞춤화

초록

최신 텍스트-이미지(T2I) 모델은 높은 정확도의 시각적 결과물을 생성하지만 개별 사용자 선호도에는 무관심한 한계가 있습니다. 기존 보상 모델이 '평균적' 인간 선호를 최적화하는 반면, 미적 판단의 본질적 주관성을 포착하지 못하고 있습니다. 본 연구에서는 개인화된 이미지 평가를 모델링하기 위해 PAMELA라는 새로운 데이터셋과 예측 프레임워크를 소개합니다. 저희 데이터셋은 최신 모델(Flux 2 및 Nano Banana)로 생성된 5,000장의 다양한 이미지에 대한 70,000개의 평가로 구성됩니다. 각 이미지는 15명의 고유 사용자에게 평가되어 예술, 디자인, 패션, 영화 사진 등 다양한 영역에 걸친 주관적 선호도의 풍부한 분포를 제공합니다. 이러한 데이터를 활용하여 고품질 주석과 기존 미적 평가 하위 집합을 함께 학습한 개인화된 보상 모델을 제안합니다. 우리 모델이 현재 최첨단 방법 대부분이 집단 수준 선호도를 예측하는 것보다 더 높은 정확도로 개인별 기호를 예측함을 입증합니다. 개인화 예측기를 사용하여 간단한 프롬프트 최적화 방법으로 개별 사용자 선호도에 맞는 생성을 유도할 수 있음을 보여줍니다. 우리의 결과는 사용자 선호도의 주관성을 다루기 위해 데이터 품질과 개인화의 중요성을 강조합니다. 개인화된 T2I 정렬 및 주관적 시각 품질 평가 연구의 표준화를 위해 데이터셋과 모델을 공개합니다.

English

Modern text-to-image (T2I) models generate high-fidelity visuals but remain indifferent to individual user preferences. While existing reward models optimize for "average" human appeal, they fail to capture the inherent subjectivity of aesthetic judgment. In this work, we introduce a novel dataset and predictive framework, called PAMELA, designed to model personalized image evaluations. Our dataset comprises 70,000 ratings across 5,000 diverse images generated by state-of-the-art models (Flux 2 and Nano Banana). Each image is evaluated by 15 unique users, providing a rich distribution of subjective preferences across domains such as art, design, fashion, and cinematic photography. Leveraging this data, we propose a personalized reward model trained jointly on our high-quality annotations and existing aesthetic assessment subsets. We demonstrate that our model predicts individual liking with higher accuracy than the majority of current state-of-the-art methods predict population-level preferences. Using our personalized predictor, we demonstrate how simple prompt optimization methods can be used to steer generations towards individual user preferences. Our results highlight the importance of data quality and personalization to handle the subjectivity of user preferences. We release our dataset and model to facilitate standardized research in personalized T2I alignment and subjective visual quality assessment.

개인의 취향에 맞는 텍스트-이미지 생성 맞춤화

Personalizing Text-to-Image Generation to Individual Taste

초록

Support