個人の嗜好に合わせたテキストから画像への生成のパーソナライズ

要旨

現代のテキストto画像（T2I）モデルは高精細な視覚的コンテンツを生成するが、個々のユーザーの嗜好に対しては無関心である。既存の報酬モデルは「平均的」な人間の魅力を最適化するが、審美判断に内在する主観性を捉えることに失敗している。本研究では、個人化された画像評価をモデル化するために設計された新規データセットおよび予測フレームワーク「PAMELA」を提案する。我々のデータセットは、最先端モデル（Flux 2およびNano Banana）によって生成された5,000枚の多様な画像に対する70,000件の評価で構成される。各画像は15名の異なるユーザーによって評価され、芸術、デザイン、ファッション、映画的な写真などの分野にわたる主観的嗜好の豊かな分布を提供する。このデータを活用し、我々の高品質な注釈と既存の美的評価サブセットを共同で学習した個人化報酬モデルを提案する。本モデルが、現在の最先端手法の大半が集団レベルの嗜好を予測するよりも高い精度で個人の嗜好を予測することを実証する。個人化予測器を用いて、単純なプロンプト最適化手法が如何に個々のユーザー嗜好に沿った生成へ誘導するために利用できるかを示す。結果は、ユーザー嗜好の主観性を扱う上でデータ品質と個人化の重要性を浮き彫りにする。個人化されたT2Iアラインメントおよび主観的視覚品質評価の標準化された研究を促進するため、データセットとモデルを公開する。

English

Modern text-to-image (T2I) models generate high-fidelity visuals but remain indifferent to individual user preferences. While existing reward models optimize for "average" human appeal, they fail to capture the inherent subjectivity of aesthetic judgment. In this work, we introduce a novel dataset and predictive framework, called PAMELA, designed to model personalized image evaluations. Our dataset comprises 70,000 ratings across 5,000 diverse images generated by state-of-the-art models (Flux 2 and Nano Banana). Each image is evaluated by 15 unique users, providing a rich distribution of subjective preferences across domains such as art, design, fashion, and cinematic photography. Leveraging this data, we propose a personalized reward model trained jointly on our high-quality annotations and existing aesthetic assessment subsets. We demonstrate that our model predicts individual liking with higher accuracy than the majority of current state-of-the-art methods predict population-level preferences. Using our personalized predictor, we demonstrate how simple prompt optimization methods can be used to steer generations towards individual user preferences. Our results highlight the importance of data quality and personalization to handle the subjectivity of user preferences. We release our dataset and model to facilitate standardized research in personalized T2I alignment and subjective visual quality assessment.

個人の嗜好に合わせたテキストから画像への生成のパーソナライズ

Personalizing Text-to-Image Generation to Individual Taste

要旨

Support