StableI2I：精准捕捉图像到图像转换中的非预期变化

摘要

在大多数真实世界的图像到图像（I2I）应用场景中，现有评估方法主要关注指令遵循能力以及生成图像的感知质量或美学效果，却普遍未能有效评估输出图像是否保留了输入图像的语义对应关系与空间结构。为弥补这一不足，我们提出StableI2I——一个统一且动态的评估框架，无需参考图像即可在包括图像编辑与图像复原在内的多种I2I任务中，显式衡量内容保真度与前后一致性。此外，我们构建了StableI2I-Bench基准测试集，用于系统评估多模态大模型在此类保真度与一致性评估任务中的准确性。大量实验结果表明，StableI2I能够对内容保真度与一致性提供精准、细粒度且可解释的评估，其评估结果与人类主观判断具有强相关性。本框架可作为实际可用的可靠评估工具，用于诊断真实世界I2I系统的内容一致性问题并对模型性能进行基准测试。

English

In most real-world image-to-image (I2I) scenarios, existing evaluations primarily focus on instruction following and the perceptual quality or aesthetics of the generated images. However, they largely fail to assess whether the output image preserves the semantic correspondence and spatial structure of the input image. To address this limitation, we propose StableI2I, a unified and dynamic evaluation framework that explicitly measures content fidelity and pre--post consistency across a wide range of I2I tasks without requiring reference images, including image editing and image restoration. In addition, we construct StableI2I-Bench, a benchmark designed to systematically evaluate the accuracy of MLLMs on such fidelity and consistency assessment tasks. Extensive experimental results demonstrate that StableI2I provides accurate, fine-grained, and interpretable evaluations of content fidelity and consistency, with strong correlations to human subjective judgments. Our framework serves as a practical and reliable evaluation tool for diagnosing content consistency and benchmarking model performance in real-world I2I systems.