重新思考超分辨率中的图像评估
Rethinking Image Evaluation in Super-Resolution
March 17, 2025
作者: Shaolin Su, Josep M. Rocafort, Danna Xue, David Serrano-Lozano, Lei Sun, Javier Vazquez-Corral
cs.AI
摘要
儘管近年來圖像超分辨率(SR)技術不斷提升其輸出結果的感知質量,但在定量評估中往往表現不佳。這種不一致性導致了對現有SR評估圖像指標的日益不信任。雖然圖像評估依賴於指標和參考真實值(GT),但研究人員通常不會審視GT的作用,因為它們普遍被視為「完美」的參考。然而,由於數據收集於早期,且忽視了對其他類型失真的控制,我們指出現有SR數據集中的GT可能質量較差,從而導致評估偏差。基於這一觀察,本文探討以下問題:現有SR數據集中的GT圖像是否百分之百可信用於模型評估?GT質量如何影響這一評估?如果存在不完美的GT,如何進行公平評估?為回答這些問題,本文提出了兩項主要貢獻。首先,通過系統分析七種最先進的SR模型在三個真實世界SR數據集上的表現,我們展示了低質量GT能一致性地影響SR模型的性能,且當控制GT質量時,模型表現會有顯著差異。其次,我們提出了一種新的感知質量指標——相對質量指數(RQI),它衡量圖像對之間的相對質量差異,從而解決了由不可靠GT引起的評估偏差問題。我們提出的模型在與人類意見的一致性上取得了顯著提升。我們期望這項工作能為SR社區提供洞見,指導未來數據集、模型和指標的開發方向。
English
While recent advancing image super-resolution (SR) techniques are continually
improving the perceptual quality of their outputs, they can usually fail in
quantitative evaluations. This inconsistency leads to a growing distrust in
existing image metrics for SR evaluations. Though image evaluation depends on
both the metric and the reference ground truth (GT), researchers typically do
not inspect the role of GTs, as they are generally accepted as `perfect'
references. However, due to the data being collected in the early years and the
ignorance of controlling other types of distortions, we point out that GTs in
existing SR datasets can exhibit relatively poor quality, which leads to biased
evaluations. Following this observation, in this paper, we are interested in
the following questions: Are GT images in existing SR datasets 100% trustworthy
for model evaluations? How does GT quality affect this evaluation? And how to
make fair evaluations if there exist imperfect GTs? To answer these questions,
this paper presents two main contributions. First, by systematically analyzing
seven state-of-the-art SR models across three real-world SR datasets, we show
that SR performances can be consistently affected across models by low-quality
GTs, and models can perform quite differently when GT quality is controlled.
Second, we propose a novel perceptual quality metric, Relative Quality Index
(RQI), that measures the relative quality discrepancy of image pairs, thus
issuing the biased evaluations caused by unreliable GTs. Our proposed model
achieves significantly better consistency with human opinions. We expect our
work to provide insights for the SR community on how future datasets, models,
and metrics should be developed.Summary
AI-Generated Summary