BizGenEval：商业视觉内容生成系统化基准评测

摘要

近期图像生成模型的进展已将其应用从审美图像扩展至实用视觉内容创作领域。然而，现有基准主要关注自然图像合成，未能系统评估模型在现实商业设计任务中结构化、多约束需求下的表现。本研究推出BizGenEval——面向商业视觉内容生成的系统性基准，涵盖演示文稿、图表、网页、海报和科学插图五种典型文档类型，从文本渲染、版式控制、属性绑定及知识推理四个核心能力维度构建20项差异化评估任务。该基准包含400个精心设计的生成提示与8000道人工校验的检查项问题，可严格评估生成图像是否满足复杂的视觉与语义约束。通过对26个主流图像生成系统（包括前沿商业API和领先开源模型）的大规模测试，发现当前生成模型与专业视觉内容创作需求之间存在显著能力差距。我们期待BizGenEval能成为现实商业视觉内容生成领域的标准化评估基准。

English

Recent advances in image generation models have expanded their applications beyond aesthetic imagery toward practical visual content creation. However, existing benchmarks mainly focus on natural image synthesis and fail to systematically evaluate models under the structured and multi-constraint requirements of real-world commercial design tasks. In this work, we introduce BizGenEval, a systematic benchmark for commercial visual content generation. The benchmark spans five representative document types: slides, charts, webpages, posters, and scientific figures, and evaluates four key capability dimensions: text rendering, layout control, attribute binding, and knowledge-based reasoning, forming 20 diverse evaluation tasks. BizGenEval contains 400 carefully curated prompts and 8000 human-verified checklist questions to rigorously assess whether generated images satisfy complex visual and semantic constraints. We conduct large-scale benchmarking on 26 popular image generation systems, including state-of-the-art commercial APIs and leading open-source models. The results reveal substantial capability gaps between current generative models and the requirements of professional visual content creation. We hope BizGenEval serves as a standardized benchmark for real-world commercial visual content generation.