HPSv3：迈向广谱人类偏好评分

摘要

评估文本到图像生成模型需与人类感知保持一致，然而现有以人为中心的度量标准受限于数据覆盖范围有限、特征提取欠佳及损失函数效率低下等问题。为应对这些挑战，我们推出了人类偏好评分第三版（HPSv3）。(1) 我们发布了HPDv3，这是首个广谱人类偏好数据集，整合了来自顶尖生成模型及从低到高质量现实世界图像的108万对文本-图像配对及117万条标注的成对比较数据。(2) 我们引入了一个基于视觉语言模型（VLM）的偏好模型，该模型采用不确定性感知的排序损失进行细粒度排序训练。此外，我们提出了人类偏好链（CoHP），一种无需额外数据即可提升质量的迭代图像优化方法，利用HPSv3在每一步选择最佳图像。大量实验证明，HPSv3作为广谱图像评估的稳健指标，而CoHP提供了一种高效且与人类感知一致的方法来提升图像生成质量。代码与数据集可在HPSv3主页获取。

English

Evaluating text-to-image generation models requires alignment with human perception, yet existing human-centric metrics are constrained by limited data coverage, suboptimal feature extraction, and inefficient loss functions. To address these challenges, we introduce Human Preference Score v3 (HPSv3). (1) We release HPDv3, the first wide-spectrum human preference dataset integrating 1.08M text-image pairs and 1.17M annotated pairwise comparisons from state-of-the-art generative models and low to high-quality real-world images. (2) We introduce a VLM-based preference model trained using an uncertainty-aware ranking loss for fine-grained ranking. Besides, we propose Chain-of-Human-Preference (CoHP), an iterative image refinement method that enhances quality without extra data, using HPSv3 to select the best image at each step. Extensive experiments demonstrate that HPSv3 serves as a robust metric for wide-spectrum image evaluation, and CoHP offers an efficient and human-aligned approach to improve image generation quality. The code and dataset are available at the HPSv3 Homepage.

HPSv3：迈向广谱人类偏好评分

HPSv3: Towards Wide-Spectrum Human Preference Score

摘要

Support