ChatPaper.aiChatPaper

重新思考文本到视频模型的人类评估协议:增强可靠性、可重现性和实用性

Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality

June 13, 2024
作者: Tianle Zhang, Langtian Ma, Yuchen Yan, Yuchen Zhang, Kai Wang, Yue Yang, Ziyao Guo, Wenqi Shao, Yang You, Yu Qiao, Ping Luo, Kaipeng Zhang
cs.AI

摘要

最近的文本到视频(T2V)技术进展,如Gen2、Pika和Sora等模型所展示的,显著拓宽了其适用性和受欢迎程度。尽管取得了这些进展,评估这些模型仍然面临重大挑战。主要原因是由于自动度量的局限性,手动评估通常被认为是评估T2V生成的一种更优越方法。然而,现有的手动评估协议存在再现性、可靠性和实用性问题。为了解决这些挑战,本文介绍了文本到视频人工评估(T2VHE)协议,这是一种全面且标准化的T2V模型评估协议。T2VHE协议包括明确定义的度量标准、全面的标注者培训以及有效的动态评估模块。实验结果表明,该协议不仅确保了高质量的标注,还可以将评估成本降低近50%。我们将开源T2VHE协议的整个设置,包括完整的协议工作流程、动态评估组件细节以及标注界面代码。这将有助于社区建立更复杂的人工评估协议。
English
Recent text-to-video (T2V) technology advancements, as demonstrated by models such as Gen2, Pika, and Sora, have significantly broadened its applicability and popularity. Despite these strides, evaluating these models poses substantial challenges. Primarily, due to the limitations inherent in automatic metrics, manual evaluation is often considered a superior method for assessing T2V generation. However, existing manual evaluation protocols face reproducibility, reliability, and practicality issues. To address these challenges, this paper introduces the Text-to-Video Human Evaluation (T2VHE) protocol, a comprehensive and standardized protocol for T2V models. The T2VHE protocol includes well-defined metrics, thorough annotator training, and an effective dynamic evaluation module. Experimental results demonstrate that this protocol not only ensures high-quality annotations but can also reduce evaluation costs by nearly 50%. We will open-source the entire setup of the T2VHE protocol, including the complete protocol workflow, the dynamic evaluation component details, and the annotation interface code. This will help communities establish more sophisticated human assessment protocols.

Summary

AI-Generated Summary

PDF91December 6, 2024