智编评测：认知与创意驱动的图像编辑基准框架

摘要

当前图像编辑模型展现出新一代智能水平，实现了认知与创意驱动的图像编辑。然而现有评估基准覆盖范围过窄，难以全面衡量这些高级能力。为此，我们推出WiseEdit——一个知识密集型基准测试体系，通过深度任务层级与广博知识维度，对认知与创意驱动的图像编辑能力进行系统性评估。借鉴人类认知创造过程，WiseEdit将图像编辑解构为感知、解析与想象三个级联步骤，每个步骤对应特定任务以检验模型在该环节的完成能力。同时包含复合型任务，要求模型同步完成多个认知环节。该基准还融入陈述性、程序性和元认知三大知识类型，最终构建包含1,220个测试案例的评估体系，客观揭示了当前最先进图像编辑模型在知识化认知推理与创意构图能力方面的局限。评估基准、测试代码及各模型生成图像将公开发布。项目主页：https://qnancy.github.io/wiseedit_project_page/。

English

Recent image editing models boast next-level intelligent capabilities, facilitating cognition- and creativity-informed image editing. Yet, existing benchmarks provide too narrow a scope for evaluation, failing to holistically assess these advanced abilities. To address this, we introduce WiseEdit, a knowledge-intensive benchmark for comprehensive evaluation of cognition- and creativity-informed image editing, featuring deep task depth and broad knowledge breadth. Drawing an analogy to human cognitive creation, WiseEdit decomposes image editing into three cascaded steps, i.e., Awareness, Interpretation, and Imagination, each corresponding to a task that poses a challenge for models to complete at the specific step. It also encompasses complex tasks, where none of the three steps can be finished easily. Furthermore, WiseEdit incorporates three fundamental types of knowledge: Declarative, Procedural, and Metacognitive knowledge. Ultimately, WiseEdit comprises 1,220 test cases, objectively revealing the limitations of SoTA image editing models in knowledge-based cognitive reasoning and creative composition capabilities. The benchmark, evaluation code, and the generated images of each model will be made publicly available soon. Project Page: https://qnancy.github.io/wiseedit_project_page/.