KRIS-Bench：下一代智能图像编辑模型基准测试平台

摘要

近期，多模态生成模型的进展在基于指令的图像编辑领域取得了显著突破。然而，尽管这些模型能够生成视觉上可信的输出，它们在基于知识的推理编辑任务上的能力仍待深入探索。本文中，我们推出了KRIS-Bench（基于知识的图像编辑系统推理基准），这是一个诊断性基准，旨在通过认知科学的视角评估模型。借鉴教育理论，KRIS-Bench将编辑任务划分为三类基础知识类型：事实性、概念性和程序性。基于此分类法，我们设计了涵盖7个推理维度的22项代表性任务，并发布了1,267个高质量标注的编辑实例。为支持细粒度评估，我们提出了一套综合评估协议，其中包含一项新颖的知识合理性度量标准，该标准通过知识提示增强，并经过人类研究校准。对10个最先进模型的实证研究揭示了在推理性能上的显著差距，强调了以知识为中心的基准对于推动智能图像编辑系统发展的必要性。

English

Recent advances in multi-modal generative models have enabled significant progress in instruction-based image editing. However, while these models produce visually plausible outputs, their capacity for knowledge-based reasoning editing tasks remains under-explored. In this paper, we introduce KRIS-Bench (Knowledge-based Reasoning in Image-editing Systems Benchmark), a diagnostic benchmark designed to assess models through a cognitively informed lens. Drawing from educational theory, KRIS-Bench categorizes editing tasks across three foundational knowledge types: Factual, Conceptual, and Procedural. Based on this taxonomy, we design 22 representative tasks spanning 7 reasoning dimensions and release 1,267 high-quality annotated editing instances. To support fine-grained evaluation, we propose a comprehensive protocol that incorporates a novel Knowledge Plausibility metric, enhanced by knowledge hints and calibrated through human studies. Empirical results on 10 state-of-the-art models reveal significant gaps in reasoning performance, highlighting the need for knowledge-centric benchmarks to advance the development of intelligent image editing systems.