TextAtlas5M:一個用於密集文字圖像生成的大規模數據集
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation
February 11, 2025
作者: Alex Jinpeng Wang, Dongxing Mao, Jiawei Zhang, Weiming Han, Zhuobai Dong, Linjie Li, Yiqi Lin, Zhengyuan Yang, Libo Qin, Fuwei Zhang, Lijuan Wang, Min Li
cs.AI
摘要
近年來,受到廣泛關注的文本條件下的圖像生成正在處理越來越長且全面的文本提示。在日常生活中,密集而複雜的文本出現在廣告、信息圖表和標識等情境中,其中文本和視覺的整合對於傳達複雜信息至關重要。然而,儘管取得進展,生成包含長文本的圖像仍然是一個持久的挑戰,這主要是由於現有數據集的限制,這些數據集通常專注於較短和較簡單的文本。為了解決這一差距,我們引入了TextAtlas5M,這是一個專門設計用於評估文本條件下的圖像生成中長文本呈現的新數據集。我們的數據集包含500萬個跨不同數據類型生成和收集的長文本圖像,能夠全面評估大規模生成模型在長文本圖像生成上的表現。我們進一步精心策劃了3000個人工改進的測試集TextAtlasEval,涵蓋3個數據領域,建立了其中一個最廣泛的文本條件生成基準。評估表明,即使對於最先進的專有模型(例如具有DallE-3的GPT4o),TextAtlasEval基準也提出了重大挑戰,而其開源對應模型表現出更大的性能差距。這些證據將TextAtlas5M定位為一個有價值的數據集,用於訓練和評估未來一代文本條件下的圖像生成模型。
English
Text-conditioned image generation has gained significant attention in recent
years and are processing increasingly longer and comprehensive text prompt. In
everyday life, dense and intricate text appears in contexts like
advertisements, infographics, and signage, where the integration of both text
and visuals is essential for conveying complex information. However, despite
these advances, the generation of images containing long-form text remains a
persistent challenge, largely due to the limitations of existing datasets,
which often focus on shorter and simpler text. To address this gap, we
introduce TextAtlas5M, a novel dataset specifically designed to evaluate
long-text rendering in text-conditioned image generation. Our dataset consists
of 5 million long-text generated and collected images across diverse data
types, enabling comprehensive evaluation of large-scale generative models on
long-text image generation. We further curate 3000 human-improved test set
TextAtlasEval across 3 data domains, establishing one of the most extensive
benchmarks for text-conditioned generation. Evaluations suggest that the
TextAtlasEval benchmarks present significant challenges even for the most
advanced proprietary models (e.g. GPT4o with DallE-3), while their open-source
counterparts show an even larger performance gap. These evidences position
TextAtlas5M as a valuable dataset for training and evaluating future-generation
text-conditioned image generation models.Summary
AI-Generated Summary