ChatPaper.aiChatPaper

LeX-Art:透過可擴展的高品質數據合成重新思考文本生成

LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis

March 27, 2025
作者: Shitian Zhao, Qilong Wu, Xinyue Li, Bo Zhang, Ming Li, Qi Qin, Dongyang Liu, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Peng Gao, Bin Fu, Zhen Li
cs.AI

摘要

我們推出LeX-Art,這是一套全面的高品質文本-圖像合成工具,系統性地彌合了提示表達力與文本渲染保真度之間的差距。我們的方法遵循數據中心的範式,基於Deepseek-R1構建了一個高品質的數據合成管道,以策展LeX-10K,這是一個包含10K張高分辨率、美學精緻的1024×1024圖像的數據集。除了數據集構建之外,我們還開發了LeX-Enhancer,這是一個強大的提示增強模型,並訓練了兩個文本到圖像模型,LeX-FLUX和LeX-Lumina,實現了最先進的文本渲染性能。為了系統性地評估視覺文本生成,我們引入了LeX-Bench,這是一個評估保真度、美學和對齊性的基準,並輔以配對歸一化編輯距離(PNED),這是一種用於穩健文本準確性評估的新穎指標。實驗結果顯示了顯著的改進,LeX-Lumina在CreateBench上實現了79.81%的PNED增益,而LeX-FLUX在顏色(+3.18%)、位置(+4.45%)和字體準確性(+3.81%)方面均優於基線。我們的代碼、模型、數據集和演示均已公開提供。
English
We introduce LeX-Art, a comprehensive suite for high-quality text-image synthesis that systematically bridges the gap between prompt expressiveness and text rendering fidelity. Our approach follows a data-centric paradigm, constructing a high-quality data synthesis pipeline based on Deepseek-R1 to curate LeX-10K, a dataset of 10K high-resolution, aesthetically refined 1024times1024 images. Beyond dataset construction, we develop LeX-Enhancer, a robust prompt enrichment model, and train two text-to-image models, LeX-FLUX and LeX-Lumina, achieving state-of-the-art text rendering performance. To systematically evaluate visual text generation, we introduce LeX-Bench, a benchmark that assesses fidelity, aesthetics, and alignment, complemented by Pairwise Normalized Edit Distance (PNED), a novel metric for robust text accuracy evaluation. Experiments demonstrate significant improvements, with LeX-Lumina achieving a 79.81% PNED gain on CreateBench, and LeX-FLUX outperforming baselines in color (+3.18%), positional (+4.45%), and font accuracy (+3.81%). Our codes, models, datasets, and demo are publicly available.

Summary

AI-Generated Summary

PDF262March 28, 2025