ChatPaper.aiChatPaper

EgoVid-5M:用于主观视角视频生成的大规模视频动作数据集

EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation

November 13, 2024
作者: Xiaofeng Wang, Kang Zhao, Feng Liu, Jiayu Wang, Guosheng Zhao, Xiaoyi Bao, Zheng Zhu, Yingya Zhang, Xingang Wang
cs.AI

摘要

视频生成已经成为一种有前途的工具,用于世界模拟,利用视觉数据来复制现实环境。在这个背景下,以人类视角为中心的自我中心视频生成具有显著的潜力,可以增强虚拟现实、增强现实和游戏应用。然而,自我中心视频的生成面临着重大挑战,因为自我中心视角的动态性质、行为的复杂多样性以及遇到的场景的复杂多样性。现有数据集无法有效解决这些挑战。为了弥合这一差距,我们提出了 EgoVid-5M,这是专门为自我中心视频生成精心策划的第一个高质量数据集。EgoVid-5M 包含 500 万个自我中心视频片段,并丰富了详细的动作注释,包括细粒度的运动控制和高级文本描述。为了确保数据集的完整性和可用性,我们实施了一个复杂的数据清洗流程,旨在在自我中心条件下保持帧一致性、动作连贯性和运动平滑性。此外,我们引入了 EgoDreamer,它能够同时由动作描述和运动控制信号驱动生成自我中心视频。EgoVid-5M 数据集、相关动作注释以及所有数据清洗元数据将被发布,以推动自我中心视频生成研究的进展。
English
Video generation has emerged as a promising tool for world simulation, leveraging visual data to replicate real-world environments. Within this context, egocentric video generation, which centers on the human perspective, holds significant potential for enhancing applications in virtual reality, augmented reality, and gaming. However, the generation of egocentric videos presents substantial challenges due to the dynamic nature of egocentric viewpoints, the intricate diversity of actions, and the complex variety of scenes encountered. Existing datasets are inadequate for addressing these challenges effectively. To bridge this gap, we present EgoVid-5M, the first high-quality dataset specifically curated for egocentric video generation. EgoVid-5M encompasses 5 million egocentric video clips and is enriched with detailed action annotations, including fine-grained kinematic control and high-level textual descriptions. To ensure the integrity and usability of the dataset, we implement a sophisticated data cleaning pipeline designed to maintain frame consistency, action coherence, and motion smoothness under egocentric conditions. Furthermore, we introduce EgoDreamer, which is capable of generating egocentric videos driven simultaneously by action descriptions and kinematic control signals. The EgoVid-5M dataset, associated action annotations, and all data cleansing metadata will be released for the advancement of research in egocentric video generation.

Summary

AI-Generated Summary

PDF273November 14, 2024