DEsignBench:探索和基準測試 DALL-E 3 以想像視覺設計
DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design
October 23, 2023
作者: Kevin Lin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Lijuan Wang
cs.AI
摘要
我們介紹了 DEsignBench,這是一個針對視覺設計場景量身定制的文本到圖像(T2I)生成基準。最近的 T2I 模型,如 DALL-E 3 等,展示了在生成與文本輸入密切一致的照片逼真圖像方面的顯著能力。儘管創建引人入勝的圖像的吸引力不可抗拒,我們的重點超越了單純的美學愉悅。我們的目標是探究在真實設計情境中使用這些強大模型的潛力。為了實現這一目標,我們開發了 DEsignBench,其中包含旨在評估 T2I 模型在“設計技術能力”和“設計應用場景”上的測試樣本。這兩個維度中的每一個都由一系列特定的設計類別支持。我們在 DEsignBench 上探索 DALL-E 3 和其他領先的 T2I 模型,形成一個全面的視覺庫,用於進行並列比較。對於 DEsignBench 的基準測試,我們對 DEsignBench 库中生成的圖像進行人工評估,評估標準包括圖像文本對齊、視覺美感和設計創造力。我們的評估還考慮其他專業設計能力,包括文本呈現、版面構圖、色彩和諧、3D 設計和媒體風格。除了人工評估外,我們還引入了由 GPT-4V 驅動的第一個自動圖像生成評估器。該評估器提供的評分與人類判斷高度一致,同時易於複製且成本效益高。高分辨率版本可在以下鏈接中獲得:https://github.com/design-bench/design-bench.github.io/raw/main/designbench.pdf?download=
English
We introduce DEsignBench, a text-to-image (T2I) generation benchmark tailored
for visual design scenarios. Recent T2I models like DALL-E 3 and others, have
demonstrated remarkable capabilities in generating photorealistic images that
align closely with textual inputs. While the allure of creating visually
captivating images is undeniable, our emphasis extends beyond mere aesthetic
pleasure. We aim to investigate the potential of using these powerful models in
authentic design contexts. In pursuit of this goal, we develop DEsignBench,
which incorporates test samples designed to assess T2I models on both "design
technical capability" and "design application scenario." Each of these two
dimensions is supported by a diverse set of specific design categories. We
explore DALL-E 3 together with other leading T2I models on DEsignBench,
resulting in a comprehensive visual gallery for side-by-side comparisons. For
DEsignBench benchmarking, we perform human evaluations on generated images in
DEsignBench gallery, against the criteria of image-text alignment, visual
aesthetic, and design creativity. Our evaluation also considers other
specialized design capabilities, including text rendering, layout composition,
color harmony, 3D design, and medium style. In addition to human evaluations,
we introduce the first automatic image generation evaluator powered by GPT-4V.
This evaluator provides ratings that align well with human judgments, while
being easily replicable and cost-efficient. A high-resolution version is
available at
https://github.com/design-bench/design-bench.github.io/raw/main/designbench.pdf?download=Summary
AI-Generated Summary