ChatPaper.aiChatPaper

ANIM-400K:用于视频自动端到端配音的大规模数据集

ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video

January 10, 2024
作者: Kevin Cai, Chonghua Liu, David M. Chan
cs.AI

摘要

互联网上丰富的内容中,高达60%以英语发布,与全球人口形成鲜明对比,仅有18.8%为英语使用者,仅有5.1%将其视为母语,导致在线信息获取存在差异。遗憾的是,用于视频配音的自动化流程——即用翻译替代视频的音频轨道——仍然是一个复杂而具有挑战性的任务,因为需要精确的时机、面部运动同步和韵律匹配。虽然端到端配音提供了一种解决方案,但数据稀缺继续阻碍了端到端和基于流水线的方法的进展。在这项工作中,我们介绍了Anim-400K,这是一个包含超过425K对齐的日语和英语动画视频片段的全面数据集,支持各种与视频相关的任务,包括自动配音、同声翻译、引导式视频摘要以及流派/主题/风格分类。我们的数据集已公开提供供研究目的使用,网址为https://github.com/davidmchan/Anim400K。
English
The Internet's wealth of content, with up to 60% published in English, starkly contrasts the global population, where only 18.8% are English speakers, and just 5.1% consider it their native language, leading to disparities in online information access. Unfortunately, automated processes for dubbing of video - replacing the audio track of a video with a translated alternative - remains a complex and challenging task due to pipelines, necessitating precise timing, facial movement synchronization, and prosody matching. While end-to-end dubbing offers a solution, data scarcity continues to impede the progress of both end-to-end and pipeline-based methods. In this work, we introduce Anim-400K, a comprehensive dataset of over 425K aligned animated video segments in Japanese and English supporting various video-related tasks, including automated dubbing, simultaneous translation, guided video summarization, and genre/theme/style classification. Our dataset is made publicly available for research purposes at https://github.com/davidmchan/Anim400K.
PDF120December 15, 2024