ChatPaper.aiChatPaper

KRIS-Bench:下一代智能圖像編輯模型基準測試

KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

May 22, 2025
作者: Yongliang Wu, Zonghui Li, Xinting Hu, Xinyu Ye, Xianfang Zeng, Gang Yu, Wenbo Zhu, Bernt Schiele, Ming-Hsuan Yang, Xu Yang
cs.AI

摘要

近期多模態生成模型的進展,使得基於指令的圖像編輯取得了顯著進步。然而,儘管這些模型能產生視覺上可信的輸出,它們在基於知識的推理編輯任務上的能力仍未被充分探索。本文介紹了KRIS-Bench(基於知識的圖像編輯系統推理基準),這是一個診斷性基準,旨在通過認知科學的視角來評估模型。借鑒教育理論,KRIS-Bench將編輯任務分為三種基礎知識類型:事實性、概念性和程序性。基於此分類法,我們設計了22個代表性任務,涵蓋7個推理維度,並發布了1,267個高質量註釋的編輯實例。為支持細粒度評估,我們提出了一個綜合評估協議,其中包含新穎的知識合理性指標,該指標通過知識提示增強並通過人類研究進行校準。在10個最先進模型上的實證結果顯示,推理性能存在顯著差距,這凸顯了以知識為中心的基準對於推動智能圖像編輯系統發展的必要性。
English
Recent advances in multi-modal generative models have enabled significant progress in instruction-based image editing. However, while these models produce visually plausible outputs, their capacity for knowledge-based reasoning editing tasks remains under-explored. In this paper, we introduce KRIS-Bench (Knowledge-based Reasoning in Image-editing Systems Benchmark), a diagnostic benchmark designed to assess models through a cognitively informed lens. Drawing from educational theory, KRIS-Bench categorizes editing tasks across three foundational knowledge types: Factual, Conceptual, and Procedural. Based on this taxonomy, we design 22 representative tasks spanning 7 reasoning dimensions and release 1,267 high-quality annotated editing instances. To support fine-grained evaluation, we propose a comprehensive protocol that incorporates a novel Knowledge Plausibility metric, enhanced by knowledge hints and calibrated through human studies. Empirical results on 10 state-of-the-art models reveal significant gaps in reasoning performance, highlighting the need for knowledge-centric benchmarks to advance the development of intelligent image editing systems.

Summary

AI-Generated Summary

PDF362May 23, 2025