SpotEdit:視覺引導圖像編輯方法評估
SpotEdit: Evaluating Visually-Guided Image Editing Methods
August 25, 2025
作者: Sara Ghazanfari, Wei-An Lin, Haitong Tian, Ersin Yumer
cs.AI
摘要
視覺引導的圖像編輯,即編輯過程同時依賴於視覺線索和文本提示,已成為一種精細且可控內容生成的強大範式。儘管近期的生成模型展現了顯著的能力,現有的評估方法仍顯簡陋,不足以全面反映實際編輯中的挑戰。我們提出了SpotEdit,這是一個旨在系統評估多種擴散模型、自回歸模型及混合生成模型在視覺引導圖像編輯方面表現的綜合基準,揭示了顯著的性能差異。針對一個關鍵但尚未充分探索的挑戰,我們的基準特別包含了對幻覺現象的專項評估,揭示了如GPT-4o等領先模型常常錯誤地感知視覺線索的存在並錯誤執行編輯任務的情況。我們的代碼和基準已公開發佈於https://github.com/SaraGhazanfari/SpotEdit。
English
Visually-guided image editing, where edits are conditioned on both visual
cues and textual prompts, has emerged as a powerful paradigm for fine-grained,
controllable content generation. Although recent generative models have shown
remarkable capabilities, existing evaluations remain simple and insufficiently
representative of real-world editing challenges. We present SpotEdit, a
comprehensive benchmark designed to systematically assess visually-guided image
editing methods across diverse diffusion, autoregressive, and hybrid generative
models, uncovering substantial performance disparities. To address a critical
yet underexplored challenge, our benchmark includes a dedicated component on
hallucination, highlighting how leading models, such as GPT-4o, often
hallucinate the existence of a visual cue and erroneously perform the editing
task. Our code and benchmark are publicly released at
https://github.com/SaraGhazanfari/SpotEdit.