HPSv3：邁向廣譜人類偏好評分

摘要

評估文本到圖像生成模型需要與人類感知保持一致，然而現有的以人為中心的指標受限於數據覆蓋範圍有限、特徵提取次優以及損失函數效率低下。為應對這些挑戰，我們引入了人類偏好評分第三版（HPSv3）。(1) 我們發布了HPDv3，這是首個廣譜人類偏好數據集，整合了來自頂尖生成模型及從低到高質量真實世界圖像的108萬文本-圖像對和117萬條註釋的成對比較。(2) 我們提出了一種基於視覺語言模型（VLM）的偏好模型，該模型採用不確定性感知的排序損失進行細粒度排名訓練。此外，我們提出了人類偏好鏈（CoHP），這是一種無需額外數據即可提升質量的迭代圖像優化方法，利用HPSv3在每一步選擇最佳圖像。大量實驗證明，HPSv3作為廣譜圖像評估的強健指標，而CoHP提供了一種高效且與人類感知一致的方法來提升圖像生成質量。代碼和數據集可在HPSv3主頁獲取。

English

Evaluating text-to-image generation models requires alignment with human perception, yet existing human-centric metrics are constrained by limited data coverage, suboptimal feature extraction, and inefficient loss functions. To address these challenges, we introduce Human Preference Score v3 (HPSv3). (1) We release HPDv3, the first wide-spectrum human preference dataset integrating 1.08M text-image pairs and 1.17M annotated pairwise comparisons from state-of-the-art generative models and low to high-quality real-world images. (2) We introduce a VLM-based preference model trained using an uncertainty-aware ranking loss for fine-grained ranking. Besides, we propose Chain-of-Human-Preference (CoHP), an iterative image refinement method that enhances quality without extra data, using HPSv3 to select the best image at each step. Extensive experiments demonstrate that HPSv3 serves as a robust metric for wide-spectrum image evaluation, and CoHP offers an efficient and human-aligned approach to improve image generation quality. The code and dataset are available at the HPSv3 Homepage.

HPSv3：邁向廣譜人類偏好評分

HPSv3: Towards Wide-Spectrum Human Preference Score

摘要

Support