T2I-CompBench:开放世界组合文本到图像生成的全面基准
T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation
July 12, 2023
作者: Kaiyi Huang, Kaiyue Sun, Enze Xie, Zhenguo Li, Xihui Liu
cs.AI
摘要
尽管最近的文本到图像模型具有生成高质量图像的惊人能力,但目前的方法通常难以有效地将具有不同属性和关系的对象组合成复杂连贯的场景。我们提出了T2I-CompBench,这是一个全面的基准,用于开放世界的组合式文本到图像生成,包括来自3个类别(属性绑定、对象关系和复杂组合)和6个子类别(颜色绑定、形状绑定、纹理绑定、空间关系、非空间关系和复杂组合)的6,000个组合式文本提示。我们进一步提出了几种专门设计用于评估组合式文本到图像生成的评估指标。我们引入了一种新方法,即奖励驱动样本选择的生成模型微调(GORS),以增强预训练文本到图像模型的组合式文本到图像生成能力。我们进行了大量实验和评估,对T2I-CompBench上的先前方法进行了基准测试,并验证了我们提出的评估指标和GORS方法的有效性。项目页面可在https://karine-h.github.io/T2I-CompBench/上找到。
English
Despite the stunning ability to generate high-quality images by recent
text-to-image models, current approaches often struggle to effectively compose
objects with different attributes and relationships into a complex and coherent
scene. We propose T2I-CompBench, a comprehensive benchmark for open-world
compositional text-to-image generation, consisting of 6,000 compositional text
prompts from 3 categories (attribute binding, object relationships, and complex
compositions) and 6 sub-categories (color binding, shape binding, texture
binding, spatial relationships, non-spatial relationships, and complex
compositions). We further propose several evaluation metrics specifically
designed to evaluate compositional text-to-image generation. We introduce a new
approach, Generative mOdel fine-tuning with Reward-driven Sample selection
(GORS), to boost the compositional text-to-image generation abilities of
pretrained text-to-image models. Extensive experiments and evaluations are
conducted to benchmark previous methods on T2I-CompBench, and to validate the
effectiveness of our proposed evaluation metrics and GORS approach. Project
page is available at https://karine-h.github.io/T2I-CompBench/.