ChatPaper.aiChatPaper

GIE-Bench:邁向基於文本引導圖像編輯的紮實評估

GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing

May 16, 2025
作者: Yusu Qian, Jiasen Lu, Tsu-Jui Fu, Xinze Wang, Chen Chen, Yinfei Yang, Wenze Hu, Zhe Gan
cs.AI

摘要

利用自然語言指令編輯圖像已成為修改視覺內容的一種自然而富於表現力的方式;然而,評估此類模型的性能仍具挑戰性。現有的評估方法通常依賴於如CLIP等圖像-文本相似度指標,這些指標缺乏精確性。在本研究中,我們引入了一個新的基準,旨在更為紮實地評估文本引導圖像編輯模型,沿著兩個關鍵維度進行:(i) 功能正確性,通過自動生成的多項選擇題來驗證是否成功應用了預期的更改;(ii) 圖像內容保留,使用對象感知的遮罩技術和保留評分來確保圖像的非目標區域在視覺上保持一致。該基準包括超過1000個高質量編輯示例,涵蓋20個不同的內容類別,每個示例都附有詳細的編輯指令、評估問題和空間對象遮罩。我們進行了一項大規模研究,將文本引導圖像編輯領域的最新旗艦模型GPT-Image-1與多種最先進的編輯模型進行比較,並將我們的自動指標與人類評分進行驗證。結果顯示,GPT-Image-1在指令遵循準確性方面領先,但經常過度修改不相關的圖像區域,凸顯了當前模型行為中的一個關鍵權衡。GIE-Bench提供了一個可擴展、可重複的框架,以推動文本引導圖像編輯的更精確評估。
English
Editing images using natural language instructions has become a natural and expressive way to modify visual content; yet, evaluating the performance of such models remains challenging. Existing evaluation approaches often rely on image-text similarity metrics like CLIP, which lack precision. In this work, we introduce a new benchmark designed to evaluate text-guided image editing models in a more grounded manner, along two critical dimensions: (i) functional correctness, assessed via automatically generated multiple-choice questions that verify whether the intended change was successfully applied; and (ii) image content preservation, which ensures that non-targeted regions of the image remain visually consistent using an object-aware masking technique and preservation scoring. The benchmark includes over 1000 high-quality editing examples across 20 diverse content categories, each annotated with detailed editing instructions, evaluation questions, and spatial object masks. We conduct a large-scale study comparing GPT-Image-1, the latest flagship in the text-guided image editing space, against several state-of-the-art editing models, and validate our automatic metrics against human ratings. Results show that GPT-Image-1 leads in instruction-following accuracy, but often over-modifies irrelevant image regions, highlighting a key trade-off in the current model behavior. GIE-Bench provides a scalable, reproducible framework for advancing more accurate evaluation of text-guided image editing.

Summary

AI-Generated Summary

PDF22May 19, 2025