SpotEdit: 시각적 안내를 통한 이미지 편집 방법 평가

초록

시각적 단서와 텍스트 프롬프트를 모두 조건으로 하는 시각 기반 이미지 편집은 세밀하고 제어 가능한 콘텐츠 생성을 위한 강력한 패러다임으로 부상하고 있습니다. 최근 생성 모델들이 놀라운 능력을 보여주고 있지만, 기존 평가 방식은 단순하며 실제 편집 작업의 도전 과제를 충분히 반영하지 못하고 있습니다. 우리는 SpotEdit를 제안하는데, 이는 다양한 디퓨전, 자기회귀, 그리고 하이브리드 생성 모델에 걸쳐 시각 기반 이미지 편집 방법을 체계적으로 평가하기 위한 포괄적인 벤치마크로, 상당한 성능 차이를 밝혀냅니다. 중요한데도 충분히 탐구되지 않은 도전 과제를 해결하기 위해, 우리의 벤치마크는 환각(hallucination)에 대한 전용 구성 요소를 포함하여 GPT-4o와 같은 주요 모델들이 종종 시각적 단서의 존재를 환각하고 잘못된 편집 작업을 수행하는 방식을 강조합니다. 우리의 코드와 벤치마크는 https://github.com/SaraGhazanfari/SpotEdit에서 공개되었습니다.

English

Visually-guided image editing, where edits are conditioned on both visual cues and textual prompts, has emerged as a powerful paradigm for fine-grained, controllable content generation. Although recent generative models have shown remarkable capabilities, existing evaluations remain simple and insufficiently representative of real-world editing challenges. We present SpotEdit, a comprehensive benchmark designed to systematically assess visually-guided image editing methods across diverse diffusion, autoregressive, and hybrid generative models, uncovering substantial performance disparities. To address a critical yet underexplored challenge, our benchmark includes a dedicated component on hallucination, highlighting how leading models, such as GPT-4o, often hallucinate the existence of a visual cue and erroneously perform the editing task. Our code and benchmark are publicly released at https://github.com/SaraGhazanfari/SpotEdit.

SpotEdit: 시각적 안내를 통한 이미지 편집 방법 평가

SpotEdit: Evaluating Visually-Guided Image Editing Methods

초록

Support