ChatPaper.aiChatPaper

T2V-CompBench:一个用于组合式文本到视频生成的全面基准。

T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

July 19, 2024
作者: Kaiyue Sun, Kaiyi Huang, Xian Liu, Yue Wu, Zihan Xu, Zhenguo Li, Xihui Liu
cs.AI

摘要

文本到视频(T2V)生成模型取得了显著进展,但它们将不同对象、属性、动作和运动组合成视频的能力仍未被探索。先前的文本到视频基准测试也忽略了这一重要能力的评估。在这项工作中,我们进行了第一次系统研究关于组合式文本到视频生成。我们提出了T2V-CompBench,这是专为组合式文本到视频生成量身定制的第一个基准测试。T2V-CompBench包含了组合性的多个方面,包括一致的属性绑定、动态属性绑定、空间关系、运动绑定、动作绑定、对象交互和生成数值。我们进一步精心设计了基于MLLM的度量、基于检测的度量和基于跟踪的度量,这些度量可以更好地反映出七个提出的类别中700个文本提示的组合式文本到视频生成质量。所提出的度量的有效性通过与人类评估的相关性得到验证。我们还对各种文本到视频生成模型进行基准测试,并在不同模型和不同组合式类别之间进行深入分析。我们发现,对于当前模型来说,组合式文本到视频生成是非常具有挑战性的,希望我们的尝试能为未来在这个方向上的研究提供启示。
English
Text-to-video (T2V) generation models have advanced significantly, yet their ability to compose different objects, attributes, actions, and motions into a video remains unexplored. Previous text-to-video benchmarks also neglect this important ability for evaluation. In this work, we conduct the first systematic study on compositional text-to-video generation. We propose T2V-CompBench, the first benchmark tailored for compositional text-to-video generation. T2V-CompBench encompasses diverse aspects of compositionality, including consistent attribute binding, dynamic attribute binding, spatial relationships, motion binding, action binding, object interactions, and generative numeracy. We further carefully design evaluation metrics of MLLM-based metrics, detection-based metrics, and tracking-based metrics, which can better reflect the compositional text-to-video generation quality of seven proposed categories with 700 text prompts. The effectiveness of the proposed metrics is verified by correlation with human evaluations. We also benchmark various text-to-video generative models and conduct in-depth analysis across different models and different compositional categories. We find that compositional text-to-video generation is highly challenging for current models, and we hope that our attempt will shed light on future research in this direction.

Summary

AI-Generated Summary

PDF274November 28, 2024