ChatPaper.aiChatPaper

IMAGINE-E:對最先進的文本到圖像模型進行圖像生成智能評估

IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models

January 23, 2025
作者: Jiayi Lei, Renrui Zhang, Xiangfei Hu, Weifeng Lin, Zhen Li, Wenjian Sun, Ruoyi Du, Le Zhuo, Zhongyu Li, Xinyue Li, Shitian Zhao, Ziyu Guo, Yiting Lu, Peng Gao, Hongsheng Li
cs.AI

摘要

隨著擴散模型的快速發展,文本到圖像(T2I)模型取得了顯著進展,在提示跟隨和圖像生成方面展示了令人印象深刻的能力。最近推出的模型,如FLUX.1和Ideogram2.0,以及其他模型如Dall-E3和Stable Diffusion 3,在各種複雜任務中展現出卓越表現,引發了關於T2I模型是否朝向通用應用的疑問。這些模型除了傳統的圖像生成外,還展示了跨越各個領域的能力,包括可控生成、圖像編輯、視頻、音頻、3D和運動生成,以及計算機視覺任務,如語義分割和深度估計。然而,目前的評估框架不足以全面評估這些模型在不斷擴展的領域中的表現。為了全面評估這些模型,我們開發了IMAGINE-E並測試了六個知名模型:FLUX.1、Ideogram2.0、Midjourney、Dall-E3、Stable Diffusion 3和Jimeng。我們的評估分為五個關鍵領域:結構化輸出生成、逼真性和物理一致性、特定領域生成、具有挑戰性場景生成以及多風格創建任務。這一全面評估突顯了每個模型的優勢和局限性,特別是FLUX.1和Ideogram2.0在結構化和特定領域任務中的優異表現,強調了T2I模型作為基礎AI工具的應用擴展和潛力。這項研究為T2I模型作為通用工具的現狀和未來發展軌跡提供了寶貴見解。評估腳本將在https://github.com/jylei16/Imagine-e 上發布。
English
With the rapid development of diffusion models, text-to-image(T2I) models have made significant progress, showcasing impressive abilities in prompt following and image generation. Recently launched models such as FLUX.1 and Ideogram2.0, along with others like Dall-E3 and Stable Diffusion 3, have demonstrated exceptional performance across various complex tasks, raising questions about whether T2I models are moving towards general-purpose applicability. Beyond traditional image generation, these models exhibit capabilities across a range of fields, including controllable generation, image editing, video, audio, 3D, and motion generation, as well as computer vision tasks like semantic segmentation and depth estimation. However, current evaluation frameworks are insufficient to comprehensively assess these models' performance across expanding domains. To thoroughly evaluate these models, we developed the IMAGINE-E and tested six prominent models: FLUX.1, Ideogram2.0, Midjourney, Dall-E3, Stable Diffusion 3, and Jimeng. Our evaluation is divided into five key domains: structured output generation, realism, and physical consistency, specific domain generation, challenging scenario generation, and multi-style creation tasks. This comprehensive assessment highlights each model's strengths and limitations, particularly the outstanding performance of FLUX.1 and Ideogram2.0 in structured and specific domain tasks, underscoring the expanding applications and potential of T2I models as foundational AI tools. This study provides valuable insights into the current state and future trajectory of T2I models as they evolve towards general-purpose usability. Evaluation scripts will be released at https://github.com/jylei16/Imagine-e.

Summary

AI-Generated Summary

PDF172January 24, 2025