ChatPaper.aiChatPaper

MiraData:具有長時間和結構化字幕的大規模視頻數據集

MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions

July 8, 2024
作者: Xuan Ju, Yiming Gao, Zhaoyang Zhang, Ziyang Yuan, Xintao Wang, Ailing Zeng, Yu Xiong, Qiang Xu, Ying Shan
cs.AI

摘要

Sora的高運動強度和長時間一致的影片顯著影響了影片生成領域,吸引了空前的關注。然而,現有的公開可用數據集對於生成類似Sora的影片是不足夠的,因為它們主要包含持續時間短、運動強度低和簡短標題的影片。為了應對這些問題,我們提出了MiraData,這是一個高質量的影片數據集,超越了先前的數據集在影片持續時間、標題細節、運動強度和視覺質量方面。我們從多樣的、手動選擇的來源中精心挑選和處理數據,以獲得語義一致的片段。我們使用GPT-4V來標註結構化標題,提供從四個不同角度的詳細描述以及總結的密集標題。為了更好地評估影片生成中的時間一致性和運動強度,我們引入了MiraBench,通過添加3D一致性和基於跟踪的運動強度指標來增強現有的基準。MiraBench包括150個評估提示和17個指標,涵蓋了時間一致性、運動強度、3D一致性、視覺質量、文本-影片對齊和分佈相似性。為了展示MiraData的實用性和有效性,我們使用基於DiT的影片生成模型MiraDiT進行實驗。在MiraBench上的實驗結果顯示了MiraData的優越性,特別是在運動強度方面。
English
Sora's high-motion intensity and long consistent videos have significantly impacted the field of video generation, attracting unprecedented attention. However, existing publicly available datasets are inadequate for generating Sora-like videos, as they mainly contain short videos with low motion intensity and brief captions. To address these issues, we propose MiraData, a high-quality video dataset that surpasses previous ones in video duration, caption detail, motion strength, and visual quality. We curate MiraData from diverse, manually selected sources and meticulously process the data to obtain semantically consistent clips. GPT-4V is employed to annotate structured captions, providing detailed descriptions from four different perspectives along with a summarized dense caption. To better assess temporal consistency and motion intensity in video generation, we introduce MiraBench, which enhances existing benchmarks by adding 3D consistency and tracking-based motion strength metrics. MiraBench includes 150 evaluation prompts and 17 metrics covering temporal consistency, motion strength, 3D consistency, visual quality, text-video alignment, and distribution similarity. To demonstrate the utility and effectiveness of MiraData, we conduct experiments using our DiT-based video generation model, MiraDiT. The experimental results on MiraBench demonstrate the superiority of MiraData, especially in motion strength.

Summary

AI-Generated Summary

PDF191November 28, 2024