FLUX-Reason-6M与PRISM-Bench：百万级图文推理数据集与综合基准测试

摘要

开源文本生成图像（T2I）模型的进步一直受到大规模、以推理为核心的数据集及全面评估基准缺失的制约，导致其与领先的闭源系统之间存在性能差距。为应对这一挑战，我们推出了FLUX-Reason-6M和PRISM-Bench（精确与鲁棒的图像合成测量基准）。FLUX-Reason-6M是一个包含600万张高质量FLUX生成图像及2000万条双语（中英文）描述的大规模数据集，专为教授复杂推理而设计。这些图像依据六大关键特性组织：想象力、实体、文本渲染、风格、情感与构图，并设计了显式的生成思维链（GCoT）以提供图像生成步骤的详细分解。整个数据整理过程耗费了15,000个A100 GPU天，为社区提供了以往仅大型工业实验室才能获取的资源。PRISM-Bench则提出了一个包含七个独特赛道的新颖评估标准，其中包括使用GCoT的艰巨长文本挑战。通过精心设计的提示，它利用先进的视觉语言模型进行细致入微、与人类对齐的提示-图像对齐度及图像美学评估。我们对19个领先模型在PRISM-Bench上的广泛评估揭示了关键性能差距，并指出了需要改进的具体领域。我们的数据集、基准及评估代码已公开发布，旨在推动下一波以推理为导向的T2I生成研究。项目页面：https://flux-reason-6m.github.io/。

English

The advancement of open-source text-to-image (T2I) models has been hindered by the absence of large-scale, reasoning-focused datasets and comprehensive evaluation benchmarks, resulting in a performance gap compared to leading closed-source systems. To address this challenge, We introduce FLUX-Reason-6M and PRISM-Bench (Precise and Robust Image Synthesis Measurement Benchmark). FLUX-Reason-6M is a massive dataset consisting of 6 million high-quality FLUX-generated images and 20 million bilingual (English and Chinese) descriptions specifically designed to teach complex reasoning. The image are organized according to six key characteristics: Imagination, Entity, Text rendering, Style, Affection, and Composition, and design explicit Generation Chain-of-Thought (GCoT) to provide detailed breakdowns of image generation steps. The whole data curation takes 15,000 A100 GPU days, providing the community with a resource previously unattainable outside of large industrial labs. PRISM-Bench offers a novel evaluation standard with seven distinct tracks, including a formidable Long Text challenge using GCoT. Through carefully designed prompts, it utilizes advanced vision-language models for nuanced human-aligned assessment of prompt-image alignment and image aesthetics. Our extensive evaluation of 19 leading models on PRISM-Bench reveals critical performance gaps and highlights specific areas requiring improvement. Our dataset, benchmark, and evaluation code are released to catalyze the next wave of reasoning-oriented T2I generation. Project page: https://flux-reason-6m.github.io/ .

FLUX-Reason-6M与PRISM-Bench：百万级图文推理数据集与综合基准测试

FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark

摘要

Support