走向开放式视觉质量比较
Towards Open-ended Visual Quality Comparison
February 26, 2024
作者: Haoning Wu, Hanwei Zhu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Chunyi Li, Annan Wang, Wenxiu Sun, Qiong Yan, Xiaohong Liu, Guangtao Zhai, Shiqi Wang, Weisi Lin
cs.AI
摘要
广泛采用了比较设置(例如成对选择、列表排序)进行图像质量评估(IQA)的主观研究,因为它在不同观察者之间固有地标准化评估标准并提供更明确的响应。在这项工作中,我们将新兴的大型多模态模型(LMMs)的边缘扩展到开放式设置,以进一步推动视觉质量比较,能够回答关于质量比较的开放范围问题,并提供超越直接答案的详细推理。为此,我们提出了Co-Instruct。为了训练这种首创的开源开放式视觉质量比较器,我们从两个来源收集了Co-Instruct-562K数据集:(a)LMM-合并的单图像质量描述,(b)GPT-4V对未标记数据的“教师”响应。此外,为了更好地评估这种设置,我们提出了MICBench,这是针对LMMs的多图像比较的第一个基准。我们证明Co-Instruct不仅比最先进的开源LMMs实现了30%更高的优越准确性,而且在现有相关基准和提出的MICBench上也胜过了GPT-4V(其教师)。我们的模型已发布在https://huggingface.co/q-future/co-instruct。
English
Comparative settings (e.g. pairwise choice, listwise ranking) have been
adopted by a wide range of subjective studies for image quality assessment
(IQA), as it inherently standardizes the evaluation criteria across different
observers and offer more clear-cut responses. In this work, we extend the edge
of emerging large multi-modality models (LMMs) to further advance visual
quality comparison into open-ended settings, that 1) can respond to open-range
questions on quality comparison; 2) can provide detailed reasonings beyond
direct answers. To this end, we propose the Co-Instruct. To train this
first-of-its-kind open-source open-ended visual quality comparer, we collect
the Co-Instruct-562K dataset, from two sources: (a) LMM-merged single image
quality description, (b) GPT-4V "teacher" responses on unlabeled data.
Furthermore, to better evaluate this setting, we propose the MICBench, the
first benchmark on multi-image comparison for LMMs. We demonstrate that
Co-Instruct not only achieves 30% higher superior accuracy than
state-of-the-art open-source LMMs, but also outperforms GPT-4V (its teacher),
on both existing related benchmarks and the proposed MICBench. Our model is
published at https://huggingface.co/q-future/co-instruct.