走向開放式視覺品質比較

摘要

相對設置（例如成對選擇、列表排序）已被廣泛應用於圖像質量評估（IQA）的各種主觀研究中，因為它在不同觀察者之間固有地標準化評估標準，並提供更清晰明確的反應。在這項工作中，我們將新興的大型多模型（LMMs）的邊緣擴展到開放式設置，以進一步推進視覺質量比較，這種設置：1）可以回答有關質量比較的開放範圍問題；2）可以提供超出直接答案的詳細推理。為此，我們提出了 Co-Instruct。為了訓練這種首創的開源開放式視覺質量比較器，我們從兩個來源收集了 Co-Instruct-562K 數據集：（a）LMM-合併的單圖像質量描述，（b）GPT-4V對未標記數據的“教師”回應。此外，為了更好地評估這種設置，我們提出了 MICBench，這是針對 LMMs 的多圖像比較的第一個基準。我們展示了 Co-Instruct 不僅比最先進的開源 LMMs 實現了30％更高的優越準確性，而且在現有相關基準和提出的 MICBench 上也優於 GPT-4V（其教師）。我們的模型已發表在 https://huggingface.co/q-future/co-instruct。

English

Comparative settings (e.g. pairwise choice, listwise ranking) have been adopted by a wide range of subjective studies for image quality assessment (IQA), as it inherently standardizes the evaluation criteria across different observers and offer more clear-cut responses. In this work, we extend the edge of emerging large multi-modality models (LMMs) to further advance visual quality comparison into open-ended settings, that 1) can respond to open-range questions on quality comparison; 2) can provide detailed reasonings beyond direct answers. To this end, we propose the Co-Instruct. To train this first-of-its-kind open-source open-ended visual quality comparer, we collect the Co-Instruct-562K dataset, from two sources: (a) LMM-merged single image quality description, (b) GPT-4V "teacher" responses on unlabeled data. Furthermore, to better evaluate this setting, we propose the MICBench, the first benchmark on multi-image comparison for LMMs. We demonstrate that Co-Instruct not only achieves 30% higher superior accuracy than state-of-the-art open-source LMMs, but also outperforms GPT-4V (its teacher), on both existing related benchmarks and the proposed MICBench. Our model is published at https://huggingface.co/q-future/co-instruct.

走向開放式視覺品質比較

Towards Open-ended Visual Quality Comparison

摘要

Support