ChatPaper.aiChatPaper

BizGenEval:商业视觉内容生成的系统性基准评测框架

BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation

March 26, 2026
作者: Yan Li, Zezi Zeng, Ziwei Zhou, Xin Gao, Muzhao Tian, Yifan Yang, Mingxi Cheng, Qi Dai, Yuqing Yang, Lili Qiu, Zhendong Wang, Zhengyuan Yang, Xue Yang, Lijuan Wang, Ji Li, Chong Luo
cs.AI

摘要

近期图像生成模型的进展已将其应用范围从审美图像扩展至实用视觉内容创作领域。然而,现有基准测试主要关注自然图像合成,未能系统评估模型在现实商业设计任务中结构化、多约束需求下的表现。本研究推出BizGenEval——一个面向商业视觉内容生成的系统性基准测试框架。该基准涵盖演示文稿、图表、网页、海报和科学插图五种典型文档类型,从文本渲染、版式控制、属性绑定和知识推理四个核心能力维度构建了20项差异化评估任务。BizGenEval包含400个精心设计的生成提示及8000道人工校验的检查清单问题,可严格评估生成图像是否满足复杂的视觉与语义约束。我们对26个主流图像生成系统(包括尖端商业API和领先开源模型)开展大规模测试,结果显示当前生成模型与专业视觉内容创作需求之间存在显著能力差距。期待BizGenEval能成为现实场景商业视觉内容生成的标准化评估基准。
English
Recent advances in image generation models have expanded their applications beyond aesthetic imagery toward practical visual content creation. However, existing benchmarks mainly focus on natural image synthesis and fail to systematically evaluate models under the structured and multi-constraint requirements of real-world commercial design tasks. In this work, we introduce BizGenEval, a systematic benchmark for commercial visual content generation. The benchmark spans five representative document types: slides, charts, webpages, posters, and scientific figures, and evaluates four key capability dimensions: text rendering, layout control, attribute binding, and knowledge-based reasoning, forming 20 diverse evaluation tasks. BizGenEval contains 400 carefully curated prompts and 8000 human-verified checklist questions to rigorously assess whether generated images satisfy complex visual and semantic constraints. We conduct large-scale benchmarking on 26 popular image generation systems, including state-of-the-art commercial APIs and leading open-source models. The results reveal substantial capability gaps between current generative models and the requirements of professional visual content creation. We hope BizGenEval serves as a standardized benchmark for real-world commercial visual content generation.
PDF91April 2, 2026