ANIM-400K：用于视频自动端到端配音的大规模数据集

摘要

互联网上丰富的内容中，高达60%以英语发布，与全球人口形成鲜明对比，仅有18.8%为英语使用者，仅有5.1%将其视为母语，导致在线信息获取存在差异。遗憾的是，用于视频配音的自动化流程——即用翻译替代视频的音频轨道——仍然是一个复杂而具有挑战性的任务，因为需要精确的时机、面部运动同步和韵律匹配。虽然端到端配音提供了一种解决方案，但数据稀缺继续阻碍了端到端和基于流水线的方法的进展。在这项工作中，我们介绍了Anim-400K，这是一个包含超过425K对齐的日语和英语动画视频片段的全面数据集，支持各种与视频相关的任务，包括自动配音、同声翻译、引导式视频摘要以及流派/主题/风格分类。我们的数据集已公开提供供研究目的使用，网址为https://github.com/davidmchan/Anim400K。

English

The Internet's wealth of content, with up to 60% published in English, starkly contrasts the global population, where only 18.8% are English speakers, and just 5.1% consider it their native language, leading to disparities in online information access. Unfortunately, automated processes for dubbing of video - replacing the audio track of a video with a translated alternative - remains a complex and challenging task due to pipelines, necessitating precise timing, facial movement synchronization, and prosody matching. While end-to-end dubbing offers a solution, data scarcity continues to impede the progress of both end-to-end and pipeline-based methods. In this work, we introduce Anim-400K, a comprehensive dataset of over 425K aligned animated video segments in Japanese and English supporting various video-related tasks, including automated dubbing, simultaneous translation, guided video summarization, and genre/theme/style classification. Our dataset is made publicly available for research purposes at https://github.com/davidmchan/Anim400K.

ANIM-400K：用于视频自动端到端配音的大规模数据集

ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video

摘要

Support