Fine-T2I：面向高质量文本到图像微调的开源、大规模、多样化数据集

摘要

高质量开放数据集仍是文本到图像生成模型微调的主要瓶颈。尽管模型架构与训练流程快速发展，但多数公开微调数据集存在分辨率低、图文对齐差或多样性有限等问题，导致开源研究模型与企业级模型间存在明显性能差距。本研究推出Fine-T2I——一个大规模、高质量、完全开放的T2I微调数据集。该数据集涵盖10种任务组合、32类提示词主题、11种视觉风格及5种提示模板，融合了现代强模型生成的合成图像与专业摄影师精心筛选的真实图像。所有样本均经过图文对齐度、视觉保真度和提示词质量的严格筛选，初始候选样本淘汰率超95%。最终数据集包含逾600万图文对，磁盘占用约2TB，在保持微调级质量的同时接近预训练数据集规模。基于多种预训练扩散模型和自回归模型的实验表明，使用Fine-T2I微调能持续提升生成质量与指令遵循能力，这一结论已通过人工评估、视觉对比和自动指标验证。我们将以开放许可协议发布Fine-T2I，助力缩小开源社区在T2I微调领域的数据差距。

English

High-quality and open datasets remain a major bottleneck for text-to-image (T2I) fine-tuning. Despite rapid progress in model architectures and training pipelines, most publicly available fine-tuning datasets suffer from low resolution, poor text-image alignment, or limited diversity, resulting in a clear performance gap between open research models and enterprise-grade models. In this work, we present Fine-T2I, a large-scale, high-quality, and fully open dataset for T2I fine-tuning. Fine-T2I spans 10 task combinations, 32 prompt categories, 11 visual styles, and 5 prompt templates, and combines synthetic images generated by strong modern models with carefully curated real images from professional photographers. All samples are rigorously filtered for text-image alignment, visual fidelity, and prompt quality, with over 95% of initial candidates removed. The final dataset contains over 6 million text-image pairs, around 2 TB on disk, approaching the scale of pretraining datasets while maintaining fine-tuning-level quality. Across a diverse set of pretrained diffusion and autoregressive models, fine-tuning on Fine-T2I consistently improves both generation quality and instruction adherence, as validated by human evaluation, visual comparison, and automatic metrics. We release Fine-T2I under an open license to help close the data gap in T2I fine-tuning in the open community.

Fine-T2I：面向高质量文本到图像微调的开源、大规模、多样化数据集

Fine-T2I: An Open, Large-Scale, and Diverse Dataset for High-Quality T2I Fine-Tuning

摘要

Support