ChatPaper.aiChatPaper

OneIG-Bench:面向图像生成的全维度精细评估基准

OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation

June 9, 2025
作者: Jingjing Chang, Yixiao Fang, Peng Xing, Shuhan Wu, Wei Cheng, Rui Wang, Xianfang Zeng, Gang Yu, Hai-Bao Chen
cs.AI

摘要

文本到图像(T2I)模型因能生成与文本提示高度契合的高质量图像而备受瞩目。然而,随着T2I模型的快速发展,早期基准测试的局限性逐渐显现,这些测试在诸如推理能力、文本渲染及风格等关键维度上缺乏全面评估。值得注意的是,凭借其强大的知识建模能力,当前最先进的模型在需要强推理能力的图像生成任务上展现出令人瞩目的成果,但现有的评估体系尚未充分触及这一前沿领域。为系统性地填补这些空白,我们推出了OneIG-Bench,这是一个精心设计的综合性基准框架,旨在从多维度对T2I模型进行细粒度评估,包括提示-图像对齐度、文本渲染精度、推理生成内容、风格化处理及多样性等方面。通过结构化评估,该基准能够深入分析模型性能,助力研究人员与实践者精准定位图像生成全流程中的优势与瓶颈。特别地,OneIG-Bench支持灵活评估,允许用户聚焦于特定评估子集。用户无需为所有提示生成图像,而仅需针对选定维度相关的提示生成图像,并据此完成相应评估。我们的代码库与数据集现已公开,旨在促进T2I研究社区内的可复现性评估研究与跨模型比较。
English
Text-to-image (T2I) models have garnered significant attention for generating high-quality images aligned with text prompts. However, rapid T2I model advancements reveal limitations in early benchmarks, lacking comprehensive evaluations, for example, the evaluation on reasoning, text rendering and style. Notably, recent state-of-the-art models, with their rich knowledge modeling capabilities, show promising results on the image generation problems requiring strong reasoning ability, yet existing evaluation systems have not adequately addressed this frontier. To systematically address these gaps, we introduce OneIG-Bench, a meticulously designed comprehensive benchmark framework for fine-grained evaluation of T2I models across multiple dimensions, including prompt-image alignment, text rendering precision, reasoning-generated content, stylization, and diversity. By structuring the evaluation, this benchmark enables in-depth analysis of model performance, helping researchers and practitioners pinpoint strengths and bottlenecks in the full pipeline of image generation. Specifically, OneIG-Bench enables flexible evaluation by allowing users to focus on a particular evaluation subset. Instead of generating images for the entire set of prompts, users can generate images only for the prompts associated with the selected dimension and complete the corresponding evaluation accordingly. Our codebase and dataset are now publicly available to facilitate reproducible evaluation studies and cross-model comparisons within the T2I research community.
PDF382June 10, 2025