FLUX-Reason-6M 與 PRISM-Bench：百萬規模的文本到圖像推理數據集與綜合基準測試

摘要

開源文本到圖像（T2I）模型的發展一直受到大規模、專注於推理的數據集和全面評估基準缺失的阻礙，導致其性能與領先的閉源系統存在差距。為應對這一挑戰，我們推出了FLUX-Reason-6M和PRISM-Bench（精確且穩健的圖像合成測量基準）。FLUX-Reason-6M是一個龐大的數據集，包含600萬張高質量的FLUX生成圖像和2000萬條雙語（英文和中文）描述，專門設計用於教授複雜推理。這些圖像根據六個關鍵特徵進行組織：想象力、實體、文本渲染、風格、情感和構圖，並設計了明確的生成思維鏈（GCoT）來提供圖像生成步驟的詳細分解。整個數據整理過程耗費了15,000個A100 GPU天，為社區提供了以往僅在大型工業實驗室中才能獲得的資源。PRISM-Bench提供了一個新穎的評估標準，包含七個不同的軌道，其中包括使用GCoT的艱鉅長文本挑戰。通過精心設計的提示，它利用先進的視覺語言模型進行細緻的人類對齊評估，涵蓋提示-圖像對齊和圖像美學。我們在PRISM-Bench上對19個領先模型進行了廣泛評估，揭示了關鍵的性能差距，並突出了需要改進的具體領域。我們的數據集、基準和評估代碼均已發布，以推動下一波面向推理的T2I生成。項目頁面：https://flux-reason-6m.github.io/。

English

The advancement of open-source text-to-image (T2I) models has been hindered by the absence of large-scale, reasoning-focused datasets and comprehensive evaluation benchmarks, resulting in a performance gap compared to leading closed-source systems. To address this challenge, We introduce FLUX-Reason-6M and PRISM-Bench (Precise and Robust Image Synthesis Measurement Benchmark). FLUX-Reason-6M is a massive dataset consisting of 6 million high-quality FLUX-generated images and 20 million bilingual (English and Chinese) descriptions specifically designed to teach complex reasoning. The image are organized according to six key characteristics: Imagination, Entity, Text rendering, Style, Affection, and Composition, and design explicit Generation Chain-of-Thought (GCoT) to provide detailed breakdowns of image generation steps. The whole data curation takes 15,000 A100 GPU days, providing the community with a resource previously unattainable outside of large industrial labs. PRISM-Bench offers a novel evaluation standard with seven distinct tracks, including a formidable Long Text challenge using GCoT. Through carefully designed prompts, it utilizes advanced vision-language models for nuanced human-aligned assessment of prompt-image alignment and image aesthetics. Our extensive evaluation of 19 leading models on PRISM-Bench reveals critical performance gaps and highlights specific areas requiring improvement. Our dataset, benchmark, and evaluation code are released to catalyze the next wave of reasoning-oriented T2I generation. Project page: https://flux-reason-6m.github.io/ .