StableI2I：偵測圖像轉換過程中的非預期變化

摘要

在多數真實世界的圖像到圖像轉換場景中，現有評估方法主要關注指令遵循能力以及生成圖像的感知質量或美學效果。然而，這些方法大多未能評估輸出圖像是否保留了輸入圖像的語義對應關係與空間結構。為解決此局限性，我們提出StableI2I——一個統一且動態的評估框架，無需參考圖像即可在包括圖像編輯與圖像修復在內的各類I2I任務中，顯式衡量內容保真度與前後一致性。此外，我們構建了StableI2I-Bench基準測試集，用於系統性評估多模態大語言模型在此類保真度與一致性評估任務中的準確性。大量實驗結果表明，StableI2I能對內容保真度與一致性提供精確、細粒度且可解釋的評估，其結果與人類主觀判斷具有強相關性。本框架可作為實用可靠的評估工具，用於診斷真實世界I2I系統中的內容一致性問題並進行模型性能基準測試。

English

In most real-world image-to-image (I2I) scenarios, existing evaluations primarily focus on instruction following and the perceptual quality or aesthetics of the generated images. However, they largely fail to assess whether the output image preserves the semantic correspondence and spatial structure of the input image. To address this limitation, we propose StableI2I, a unified and dynamic evaluation framework that explicitly measures content fidelity and pre--post consistency across a wide range of I2I tasks without requiring reference images, including image editing and image restoration. In addition, we construct StableI2I-Bench, a benchmark designed to systematically evaluate the accuracy of MLLMs on such fidelity and consistency assessment tasks. Extensive experimental results demonstrate that StableI2I provides accurate, fine-grained, and interpretable evaluations of content fidelity and consistency, with strong correlations to human subjective judgments. Our framework serves as a practical and reliable evaluation tool for diagnosing content consistency and benchmarking model performance in real-world I2I systems.

StableI2I：偵測圖像轉換過程中的非預期變化

StableI2I: Spotting Unintended Changes in Image-to-Image Transition

摘要

Support