ChatPaper.aiChatPaper

MantisScore:建立自動評量指標以模擬對於影片生成的細緻人類反饋

MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

June 21, 2024
作者: Xuan He, Dongfu Jiang, Ge Zhang, Max Ku, Achint Soni, Sherman Siu, Haonan Chen, Abhranil Chandra, Ziyan Jiang, Aaran Arulraj, Kai Wang, Quy Duc Do, Yuansheng Ni, Bohan Lyu, Yaswanth Narsupalli, Rongqi Fan, Zhiheng Lyu, Yuchen Lin, Wenhu Chen
cs.AI

摘要

近年來,視頻生成取得了巨大進展。然而,自動視頻評量的發展明顯滯後。現有的評量指標無法為生成的視頻提供可靠的分數。主要障礙是缺乏大規模的人工標註數據集。本文中,我們釋出了VideoFeedback,這是第一個大規模數據集,包含對來自11個現有視頻生成模型的37.6K合成視頻的人工提供的多方面評分。我們基於VideoFeedback訓練了MantisScore(從Mantis初始化),以實現自動視頻質量評估。實驗表明,MantisScore與人類之間的Spearman相關性在VideoFeedback-test上可以達到77.1,比先前最佳指標高出約50分。在其他留出的EvalCrafter、GenAI-Bench和VBench上的進一步結果表明,MantisScore與人類評審之間的相關性一直遠高於其他指標。基於這些結果,我們認為MantisScore可以作為人類評分者的一個很好的代理,用於(1)評估不同的視頻模型以追踪進展,(2)在帶有人類反饋的強化學習中模擬細緻的人類反饋,以改進當前的視頻生成模型。
English
The recent years have witnessed great advances in video generation. However, the development of automatic video metrics is lagging significantly behind. None of the existing metric is able to provide reliable scores over generated videos. The main barrier is the lack of large-scale human-annotated dataset. In this paper, we release VideoFeedback, the first large-scale dataset containing human-provided multi-aspect score over 37.6K synthesized videos from 11 existing video generative models. We train MantisScore (initialized from Mantis) based on VideoFeedback to enable automatic video quality assessment. Experiments show that the Spearman correlation between MantisScore and humans can reach 77.1 on VideoFeedback-test, beating the prior best metrics by about 50 points. Further result on other held-out EvalCrafter, GenAI-Bench, and VBench show that MantisScore has consistently much higher correlation with human judges than other metrics. Due to these results, we believe MantisScore can serve as a great proxy for human raters to (1) rate different video models to track progress (2) simulate fine-grained human feedback in Reinforcement Learning with Human Feedback (RLHF) to improve current video generation models.

Summary

AI-Generated Summary

PDF171November 29, 2024