ChatPaper.aiChatPaper

Q-Eval-100K:評估文本到視覺內容的視覺品質與對齊程度

Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content

March 4, 2025
作者: Zicheng Zhang, Tengchuan Kou, Shushi Wang, Chunyi Li, Wei Sun, Wei Wang, Xiaoyu Li, Zongyu Wang, Xuezhi Cao, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai
cs.AI

摘要

評估文本到視覺內容的質量主要依賴於兩個關鍵方面:視覺品質和對齊度。儘管在開發客觀模型來評估這些維度方面已取得顯著進展,但此類模型的性能很大程度上取決於人工標註的規模和質量。根據規模定律,增加人工標註的數量遵循一種可預測的模式,能夠提升評估模型的性能。因此,我們引入了一個全面的數據集,旨在評估文本到視覺內容的視覺品質和對齊水平(Q-EVAL-100K),該數據集包含了針對上述兩個方面最大規模的人工標註平均意見分數(MOS)。Q-EVAL-100K數據集涵蓋了文本到圖像和文本到視頻模型,擁有960K條專門針對100K個實例(60K張圖片和40K個視頻)的視覺品質和對齊度的人工標註。利用這一帶有上下文提示的數據集,我們提出了Q-Eval-Score,這是一個能夠評估視覺品質和對齊度的統一模型,特別改進了對長文本提示對齊的處理。實驗結果表明,所提出的Q-Eval-Score在視覺品質和對齊度上均達到了優異的性能,並在其他基準測試中展現出強大的泛化能力。這些發現凸顯了Q-EVAL-100K數據集的重大價值。數據和代碼將在https://github.com/zzc-1998/Q-Eval 上公開。
English
Evaluating text-to-vision content hinges on two crucial aspects: visual quality and alignment. While significant progress has been made in developing objective models to assess these dimensions, the performance of such models heavily relies on the scale and quality of human annotations. According to Scaling Law, increasing the number of human-labeled instances follows a predictable pattern that enhances the performance of evaluation models. Therefore, we introduce a comprehensive dataset designed to Evaluate Visual quality and Alignment Level for text-to-vision content (Q-EVAL-100K), featuring the largest collection of human-labeled Mean Opinion Scores (MOS) for the mentioned two aspects. The Q-EVAL-100K dataset encompasses both text-to-image and text-to-video models, with 960K human annotations specifically focused on visual quality and alignment for 100K instances (60K images and 40K videos). Leveraging this dataset with context prompt, we propose Q-Eval-Score, a unified model capable of evaluating both visual quality and alignment with special improvements for handling long-text prompt alignment. Experimental results indicate that the proposed Q-Eval-Score achieves superior performance on both visual quality and alignment, with strong generalization capabilities across other benchmarks. These findings highlight the significant value of the Q-EVAL-100K dataset. Data and codes will be available at https://github.com/zzc-1998/Q-Eval.

Summary

AI-Generated Summary

PDF72March 5, 2025