ChatPaper.aiChatPaper

基于属性条件人工评估的图像生成多样性基准测试

Benchmarking Diversity in Image Generation via Attribute-Conditional Human Evaluation

November 13, 2025
作者: Isabela Albuquerque, Ira Ktena, Olivia Wiles, Ivana Kajić, Amal Rannen-Triki, Cristina Vasconcelos, Aida Nematzadeh
cs.AI

摘要

尽管生成质量不断提升,当前文本到图像(T2I)模型仍常因输出同质化而缺乏多样性。本研究提出一个框架,旨在解决T2I模型多样性稳健评估的需求。该框架通过评估独立概念及其相关变异因素,系统化地衡量多样性。核心贡献包括:(1)用于精细化多样性评估的新型人工评估模板;(2)涵盖多维度概念及其已识别变异因素的精选提示词集(如提示词:"苹果图像",变异因素:颜色);(3)基于二项检验的人工标注模型对比方法。此外,我们严谨比较了多种用于多样性测量的图像嵌入方法。值得注意的是,这种原理性方法可实现T2I模型的多样性排序,并识别其表现薄弱的特定类别。本研究提供了稳健的方法论与深刻见解,为提升T2I模型多样性及度量标准开发开辟了新路径。
English
Despite advances in generation quality, current text-to-image (T2I) models often lack diversity, generating homogeneous outputs. This work introduces a framework to address the need for robust diversity evaluation in T2I models. Our framework systematically assesses diversity by evaluating individual concepts and their relevant factors of variation. Key contributions include: (1) a novel human evaluation template for nuanced diversity assessment; (2) a curated prompt set covering diverse concepts with their identified factors of variation (e.g. prompt: An image of an apple, factor of variation: color); and (3) a methodology for comparing models in terms of human annotations via binomial tests. Furthermore, we rigorously compare various image embeddings for diversity measurement. Notably, our principled approach enables ranking of T2I models by diversity, identifying categories where they particularly struggle. This research offers a robust methodology and insights, paving the way for improvements in T2I model diversity and metric development.
PDF42December 1, 2025