LeX-Art:通过可扩展的高质量数据合成重新思考文本生成
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis
March 27, 2025
作者: Shitian Zhao, Qilong Wu, Xinyue Li, Bo Zhang, Ming Li, Qi Qin, Dongyang Liu, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Peng Gao, Bin Fu, Zhen Li
cs.AI
摘要
我们推出LeX-Art,这是一套全面的高质量文生图合成工具,系统性地弥合了提示表达力与文本渲染保真度之间的鸿沟。我们的方法遵循数据为中心的理念,基于Deepseek-R1构建了一个高质量的数据合成管道,精心打造了LeX-10K数据集,包含10,000张高分辨率、美学精炼的1024×1024图像。除了数据集构建,我们还开发了LeX-Enhancer,一个强大的提示增强模型,并训练了两个文生图模型——LeX-FLUX和LeX-Lumina,实现了业界领先的文本渲染性能。为了系统评估视觉文本生成,我们引入了LeX-Bench基准测试,评估保真度、美学及对齐度,并辅以成对归一化编辑距离(PNED),一种用于稳健文本准确性评估的新颖指标。实验显示显著改进,LeX-Lumina在CreateBench上实现了79.81%的PNED提升,而LeX-FLUX在色彩(+3.18%)、位置(+4.45%)和字体准确性(+3.81%)上均超越基线。我们的代码、模型、数据集及演示均公开可用。
English
We introduce LeX-Art, a comprehensive suite for high-quality text-image
synthesis that systematically bridges the gap between prompt expressiveness and
text rendering fidelity. Our approach follows a data-centric paradigm,
constructing a high-quality data synthesis pipeline based on Deepseek-R1 to
curate LeX-10K, a dataset of 10K high-resolution, aesthetically refined
1024times1024 images. Beyond dataset construction, we develop LeX-Enhancer,
a robust prompt enrichment model, and train two text-to-image models, LeX-FLUX
and LeX-Lumina, achieving state-of-the-art text rendering performance. To
systematically evaluate visual text generation, we introduce LeX-Bench, a
benchmark that assesses fidelity, aesthetics, and alignment, complemented by
Pairwise Normalized Edit Distance (PNED), a novel metric for robust text
accuracy evaluation. Experiments demonstrate significant improvements, with
LeX-Lumina achieving a 79.81% PNED gain on CreateBench, and LeX-FLUX
outperforming baselines in color (+3.18%), positional (+4.45%), and font
accuracy (+3.81%). Our codes, models, datasets, and demo are publicly
available.Summary
AI-Generated Summary