ChatPaper.aiChatPaper

重新思考文本到視頻模型的人類評估協議:提升可靠性、可重現性和實用性

Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality

June 13, 2024
作者: Tianle Zhang, Langtian Ma, Yuchen Yan, Yuchen Zhang, Kai Wang, Yue Yang, Ziyao Guo, Wenqi Shao, Yang You, Yu Qiao, Ping Luo, Kaipeng Zhang
cs.AI

摘要

最近的文本轉視頻(T2V)技術進步,如Gen2、Pika和Sora等模型所展示的,顯著擴大了其應用範圍和受歡迎程度。儘管取得了這些進展,評估這些模型仍然面臨著重大挑戰。主要是由於自動指標固有的限制,手動評估通常被認為是評估T2V生成的一種更優越方法。然而,現有的手動評估協議存在著可重現性、可靠性和實用性問題。為了應對這些挑戰,本文介紹了文本轉視頻人工評估(T2VHE)協議,這是一個全面且標準化的T2V模型協議。T2VHE協議包括明確定義的指標、詳盡的標註者培訓以及一個有效的動態評估模塊。實驗結果表明,該協議不僅確保了高質量的標註,還可以將評估成本降低近50%。我們將開源T2VHE協議的整個設置,包括完整的協議工作流程、動態評估組件細節和標註界面代碼。這將幫助社群建立更複雜的人工評估協議。
English
Recent text-to-video (T2V) technology advancements, as demonstrated by models such as Gen2, Pika, and Sora, have significantly broadened its applicability and popularity. Despite these strides, evaluating these models poses substantial challenges. Primarily, due to the limitations inherent in automatic metrics, manual evaluation is often considered a superior method for assessing T2V generation. However, existing manual evaluation protocols face reproducibility, reliability, and practicality issues. To address these challenges, this paper introduces the Text-to-Video Human Evaluation (T2VHE) protocol, a comprehensive and standardized protocol for T2V models. The T2VHE protocol includes well-defined metrics, thorough annotator training, and an effective dynamic evaluation module. Experimental results demonstrate that this protocol not only ensures high-quality annotations but can also reduce evaluation costs by nearly 50%. We will open-source the entire setup of the T2VHE protocol, including the complete protocol workflow, the dynamic evaluation component details, and the annotation interface code. This will help communities establish more sophisticated human assessment protocols.

Summary

AI-Generated Summary

PDF91December 6, 2024