T2I-CompBench：一個針對開放式世界的組合式文本到圖像生成的全面基準。

摘要

儘管最近的文本到圖像模型具有生成高質量圖像的驚人能力，但目前的方法常常難以有效地將具有不同屬性和關係的物體組合成複雜且連貫的場景。我們提出了T2I-CompBench，這是一個全面的基準測試，用於開放世界的組合式文本到圖像生成，包括來自3個類別（屬性綁定、物體關係和複雜組合）和6個子類別（顏色綁定、形狀綁定、紋理綁定、空間關係、非空間關係和複雜組合）的6,000個組合式文本提示。我們進一步提出了幾個專門設計用於評估組合式文本到圖像生成的評估指標。我們引入了一種新方法，即通過獎勵驅動的樣本選擇對生成模型進行微調（GORS），以提升預訓練文本到圖像模型的組合式文本到圖像生成能力。我們進行了大量實驗和評估，以在T2I-CompBench上對以前的方法進行基準測試，並驗證我們提出的評估指標和GORS方法的有效性。項目頁面位於https://karine-h.github.io/T2I-CompBench/。

English

Despite the stunning ability to generate high-quality images by recent text-to-image models, current approaches often struggle to effectively compose objects with different attributes and relationships into a complex and coherent scene. We propose T2I-CompBench, a comprehensive benchmark for open-world compositional text-to-image generation, consisting of 6,000 compositional text prompts from 3 categories (attribute binding, object relationships, and complex compositions) and 6 sub-categories (color binding, shape binding, texture binding, spatial relationships, non-spatial relationships, and complex compositions). We further propose several evaluation metrics specifically designed to evaluate compositional text-to-image generation. We introduce a new approach, Generative mOdel fine-tuning with Reward-driven Sample selection (GORS), to boost the compositional text-to-image generation abilities of pretrained text-to-image models. Extensive experiments and evaluations are conducted to benchmark previous methods on T2I-CompBench, and to validate the effectiveness of our proposed evaluation metrics and GORS approach. Project page is available at https://karine-h.github.io/T2I-CompBench/.

T2I-CompBench：一個針對開放式世界的組合式文本到圖像生成的全面基準。

T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

摘要

Support