WorldGenBench:面向推理驱动型文本到图像生成的世界知识融合基准测试平台
WorldGenBench: A World-Knowledge-Integrated Benchmark for Reasoning-Driven Text-to-Image Generation
May 2, 2025
作者: Daoan Zhang, Che Jiang, Ruoshi Xu, Biaoxiang Chen, Zijian Jin, Yutian Lu, Jianguo Zhang, Liang Yong, Jiebo Luo, Shengda Luo
cs.AI
摘要
近期,文本到图像(T2I)生成技术取得了显著进展,然而现有模型在处理需要丰富世界知识和隐含推理的提示时仍显不足,这两者对于在现实场景中生成语义准确、连贯且上下文恰当的图像至关重要。为填补这一空白,我们推出了WorldGenBench,一个旨在系统评估T2I模型世界知识基础和隐含推理能力的基准,涵盖人文与自然两大领域。我们提出了知识清单评分(Knowledge Checklist Score),这一结构化指标用于衡量生成图像在多大程度上满足关键语义预期。通过对21个顶尖模型的实验分析,我们发现,尽管扩散模型在开源方法中表现领先,但如GPT-4o等专有自回归模型在推理和知识整合方面展现出显著优势。我们的研究结果强调了下一代T2I系统需具备更深层次的理解与推理能力。项目页面:https://dwanzhang-ai.github.io/WorldGenBench/
English
Recent advances in text-to-image (T2I) generation have achieved impressive
results, yet existing models still struggle with prompts that require rich
world knowledge and implicit reasoning: both of which are critical for
producing semantically accurate, coherent, and contextually appropriate images
in real-world scenarios. To address this gap, we introduce
WorldGenBench, a benchmark designed to systematically evaluate T2I
models' world knowledge grounding and implicit inferential capabilities,
covering both the humanities and nature domains. We propose the
Knowledge Checklist Score, a structured metric that measures how well
generated images satisfy key semantic expectations. Experiments across 21
state-of-the-art models reveal that while diffusion models lead among
open-source methods, proprietary auto-regressive models like GPT-4o exhibit
significantly stronger reasoning and knowledge integration. Our findings
highlight the need for deeper understanding and inference capabilities in
next-generation T2I systems. Project Page:
https://dwanzhang-ai.github.io/WorldGenBench/{https://dwanzhang-ai.github.io/WorldGenBench/}Summary
AI-Generated Summary