ANIM-400K:用于视频自动端到端配音的大规模数据集
ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video
January 10, 2024
作者: Kevin Cai, Chonghua Liu, David M. Chan
cs.AI
摘要
互联网上丰富的内容中,高达60%以英语发布,与全球人口形成鲜明对比,仅有18.8%为英语使用者,仅有5.1%将其视为母语,导致在线信息获取存在差异。遗憾的是,用于视频配音的自动化流程——即用翻译替代视频的音频轨道——仍然是一个复杂而具有挑战性的任务,因为需要精确的时机、面部运动同步和韵律匹配。虽然端到端配音提供了一种解决方案,但数据稀缺继续阻碍了端到端和基于流水线的方法的进展。在这项工作中,我们介绍了Anim-400K,这是一个包含超过425K对齐的日语和英语动画视频片段的全面数据集,支持各种与视频相关的任务,包括自动配音、同声翻译、引导式视频摘要以及流派/主题/风格分类。我们的数据集已公开提供供研究目的使用,网址为https://github.com/davidmchan/Anim400K。
English
The Internet's wealth of content, with up to 60% published in English,
starkly contrasts the global population, where only 18.8% are English speakers,
and just 5.1% consider it their native language, leading to disparities in
online information access. Unfortunately, automated processes for dubbing of
video - replacing the audio track of a video with a translated alternative -
remains a complex and challenging task due to pipelines, necessitating precise
timing, facial movement synchronization, and prosody matching. While end-to-end
dubbing offers a solution, data scarcity continues to impede the progress of
both end-to-end and pipeline-based methods. In this work, we introduce
Anim-400K, a comprehensive dataset of over 425K aligned animated video segments
in Japanese and English supporting various video-related tasks, including
automated dubbing, simultaneous translation, guided video summarization, and
genre/theme/style classification. Our dataset is made publicly available for
research purposes at https://github.com/davidmchan/Anim400K.