ChatPaper.aiChatPaper

MantisScore:构建自动度量标准以模拟视频生成的细粒度人类反馈

MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

June 21, 2024
作者: Xuan He, Dongfu Jiang, Ge Zhang, Max Ku, Achint Soni, Sherman Siu, Haonan Chen, Abhranil Chandra, Ziyan Jiang, Aaran Arulraj, Kai Wang, Quy Duc Do, Yuansheng Ni, Bohan Lyu, Yaswanth Narsupalli, Rongqi Fan, Zhiheng Lyu, Yuchen Lin, Wenhu Chen
cs.AI

摘要

近年来,视频生成取得了巨大进展。然而,自动视频评估指标的发展明显滞后。目前没有任何现有指标能够为生成的视频提供可靠的评分。主要障碍在于缺乏大规模的人工标注数据集。本文发布了VideoFeedback,这是第一个大规模数据集,包含对来自11个现有视频生成模型的37.6K个合成视频的人工提供的多方面评分。我们基于VideoFeedback训练了MantisScore(从Mantis初始化),以实现自动视频质量评估。实验表明,MantisScore与人类之间的Spearman相关性在VideoFeedback-test上可达到77.1,比先前最佳指标高出约50个百分点。在其他留置数据集EvalCrafter、GenAI-Bench和VBench上的进一步结果显示,MantisScore与人类评委的相关性始终比其他指标高得多。基于这些结果,我们相信MantisScore可以作为人类评分者的良好替代,用于(1)评估不同视频模型以跟踪进展,(2)在人类反馈强化学习(RLHF)中模拟细粒度的人类反馈,以改进当前的视频生成模型。
English
The recent years have witnessed great advances in video generation. However, the development of automatic video metrics is lagging significantly behind. None of the existing metric is able to provide reliable scores over generated videos. The main barrier is the lack of large-scale human-annotated dataset. In this paper, we release VideoFeedback, the first large-scale dataset containing human-provided multi-aspect score over 37.6K synthesized videos from 11 existing video generative models. We train MantisScore (initialized from Mantis) based on VideoFeedback to enable automatic video quality assessment. Experiments show that the Spearman correlation between MantisScore and humans can reach 77.1 on VideoFeedback-test, beating the prior best metrics by about 50 points. Further result on other held-out EvalCrafter, GenAI-Bench, and VBench show that MantisScore has consistently much higher correlation with human judges than other metrics. Due to these results, we believe MantisScore can serve as a great proxy for human raters to (1) rate different video models to track progress (2) simulate fine-grained human feedback in Reinforcement Learning with Human Feedback (RLHF) to improve current video generation models.

Summary

AI-Generated Summary

PDF171November 29, 2024