MiraData:一个具有长时长和结构化字幕的大规模视频数据集
MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions
July 8, 2024
作者: Xuan Ju, Yiming Gao, Zhaoyang Zhang, Ziyang Yuan, Xintao Wang, Ailing Zeng, Yu Xiong, Qiang Xu, Ying Shan
cs.AI
摘要
Sora的高运动强度和持续时间长的视频显著影响了视频生成领域,吸引了空前的关注。然而,现有的公开可用数据集无法生成类似Sora的视频,因为它们主要包含持续时间短、运动强度低和简短标题的视频。为解决这些问题,我们提出了MiraData,这是一个高质量视频数据集,超越了先前数据集在视频持续时间、标题细节、运动强度和视觉质量方面的限制。我们从多样化、手动选择的来源中精心筛选MiraData,并对数据进行精心处理,以获得语义一致的片段。我们采用GPT-4V对结构化标题进行注释,从四个不同角度提供详细描述以及总结的密集标题。为了更好地评估视频生成中的时间一致性和运动强度,我们引入了MiraBench,通过添加3D一致性和基于跟踪的运动强度度量来增强现有基准。MiraBench包括150个评估提示和17个指标,涵盖时间一致性、运动强度、3D一致性、视觉质量、文本-视频对齐和分布相似性。为了展示MiraData的实用性和有效性,我们使用基于DiT的视频生成模型MiraDiT进行实验。在MiraBench上的实验结果显示了MiraData的优越性,尤其在运动强度方面。
English
Sora's high-motion intensity and long consistent videos have significantly
impacted the field of video generation, attracting unprecedented attention.
However, existing publicly available datasets are inadequate for generating
Sora-like videos, as they mainly contain short videos with low motion intensity
and brief captions. To address these issues, we propose MiraData, a
high-quality video dataset that surpasses previous ones in video duration,
caption detail, motion strength, and visual quality. We curate MiraData from
diverse, manually selected sources and meticulously process the data to obtain
semantically consistent clips. GPT-4V is employed to annotate structured
captions, providing detailed descriptions from four different perspectives
along with a summarized dense caption. To better assess temporal consistency
and motion intensity in video generation, we introduce MiraBench, which
enhances existing benchmarks by adding 3D consistency and tracking-based motion
strength metrics. MiraBench includes 150 evaluation prompts and 17 metrics
covering temporal consistency, motion strength, 3D consistency, visual quality,
text-video alignment, and distribution similarity. To demonstrate the utility
and effectiveness of MiraData, we conduct experiments using our DiT-based video
generation model, MiraDiT. The experimental results on MiraBench demonstrate
the superiority of MiraData, especially in motion strength.Summary
AI-Generated Summary