我看起來像是一隻「貓.n.01」嗎？一個分類學圖像生成基準

摘要

本文探討了在零樣本設置下使用文本到圖像模型生成分類學概念圖像的可行性。雖然基於文本的分類學擴充方法已相當成熟，但視覺維度的潛力仍未被充分探索。為此，我們提出了一個全面的分類學圖像生成基準，用於評估模型理解分類學概念並生成相關高質量圖像的能力。該基準包括常識性及隨機抽樣的WordNet概念，以及大型語言模型生成的預測。我們使用9種新穎的分類學相關文本到圖像指標及人類反饋對12個模型進行了評估。此外，我們率先將GPT-4反饋的成對評估應用於圖像生成。實驗結果顯示，模型的排名與標準的文本到圖像任務有顯著差異。Playground-v2和FLUX在各項指標和子集中持續表現優異，而基於檢索的方法表現不佳。這些發現凸顯了自動化結構化數據資源整理的潛力。

English

This paper explores the feasibility of using text-to-image models in a zero-shot setup to generate images for taxonomy concepts. While text-based methods for taxonomy enrichment are well-established, the potential of the visual dimension remains unexplored. To address this, we propose a comprehensive benchmark for Taxonomy Image Generation that assesses models' abilities to understand taxonomy concepts and generate relevant, high-quality images. The benchmark includes common-sense and randomly sampled WordNet concepts, alongside the LLM generated predictions. The 12 models are evaluated using 9 novel taxonomy-related text-to-image metrics and human feedback. Moreover, we pioneer the use of pairwise evaluation with GPT-4 feedback for image generation. Experimental results show that the ranking of models differs significantly from standard T2I tasks. Playground-v2 and FLUX consistently outperform across metrics and subsets and the retrieval-based approach performs poorly. These findings highlight the potential for automating the curation of structured data resources.

我看起來像是一隻「貓.n.01」嗎？一個分類學圖像生成基準

Do I look like a `cat.n.01` to you? A Taxonomy Image Generation Benchmark

摘要

Support