走向开放式视觉质量比较

摘要

广泛采用了比较设置（例如成对选择、列表排序）进行图像质量评估（IQA）的主观研究，因为它在不同观察者之间固有地标准化评估标准并提供更明确的响应。在这项工作中，我们将新兴的大型多模态模型（LMMs）的边缘扩展到开放式设置，以进一步推动视觉质量比较，能够回答关于质量比较的开放范围问题，并提供超越直接答案的详细推理。为此，我们提出了Co-Instruct。为了训练这种首创的开源开放式视觉质量比较器，我们从两个来源收集了Co-Instruct-562K数据集：（a）LMM-合并的单图像质量描述，（b）GPT-4V对未标记数据的“教师”响应。此外，为了更好地评估这种设置，我们提出了MICBench，这是针对LMMs的多图像比较的第一个基准。我们证明Co-Instruct不仅比最先进的开源LMMs实现了30%更高的优越准确性，而且在现有相关基准和提出的MICBench上也胜过了GPT-4V（其教师）。我们的模型已发布在https://huggingface.co/q-future/co-instruct。

English

Comparative settings (e.g. pairwise choice, listwise ranking) have been adopted by a wide range of subjective studies for image quality assessment (IQA), as it inherently standardizes the evaluation criteria across different observers and offer more clear-cut responses. In this work, we extend the edge of emerging large multi-modality models (LMMs) to further advance visual quality comparison into open-ended settings, that 1) can respond to open-range questions on quality comparison; 2) can provide detailed reasonings beyond direct answers. To this end, we propose the Co-Instruct. To train this first-of-its-kind open-source open-ended visual quality comparer, we collect the Co-Instruct-562K dataset, from two sources: (a) LMM-merged single image quality description, (b) GPT-4V "teacher" responses on unlabeled data. Furthermore, to better evaluate this setting, we propose the MICBench, the first benchmark on multi-image comparison for LMMs. We demonstrate that Co-Instruct not only achieves 30% higher superior accuracy than state-of-the-art open-source LMMs, but also outperforms GPT-4V (its teacher), on both existing related benchmarks and the proposed MICBench. Our model is published at https://huggingface.co/q-future/co-instruct.

走向开放式视觉质量比较

Towards Open-ended Visual Quality Comparison

摘要

Support