ChatPaper.aiChatPaper

ANIM-400K:用於視頻自動端對端配音的大規模數據集

ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video

January 10, 2024
作者: Kevin Cai, Chonghua Liu, David M. Chan
cs.AI

摘要

網路上豐富的內容中,高達60%以英文發表,與全球人口形成鮮明對比,僅有18.8%為英文使用者,而只有5.1%將其視為母語,導致線上資訊存取存在差異。不幸的是,用於視頻配音的自動化流程,即將視頻的音軌替換為翻譯版本,仍然是一項複雜且具挑戰性的任務,因為必須考慮到流程、精確的時序、面部動作同步和韻律匹配。儘管端對端配音提供了解決方案,但數據稀缺仍然阻礙了端對端和基於流程的方法的進展。在這項工作中,我們介紹了Anim-400K,這是一個包含超過425K對齊的日語和英語動畫視頻片段的全面數據集,支持各種與視頻相關的任務,包括自動配音、同步翻譯、引導式視頻摘要以及類型/主題/風格分類。我們的數據集已公開提供給研究人員使用,網址為https://github.com/davidmchan/Anim400K。
English
The Internet's wealth of content, with up to 60% published in English, starkly contrasts the global population, where only 18.8% are English speakers, and just 5.1% consider it their native language, leading to disparities in online information access. Unfortunately, automated processes for dubbing of video - replacing the audio track of a video with a translated alternative - remains a complex and challenging task due to pipelines, necessitating precise timing, facial movement synchronization, and prosody matching. While end-to-end dubbing offers a solution, data scarcity continues to impede the progress of both end-to-end and pipeline-based methods. In this work, we introduce Anim-400K, a comprehensive dataset of over 425K aligned animated video segments in Japanese and English supporting various video-related tasks, including automated dubbing, simultaneous translation, guided video summarization, and genre/theme/style classification. Our dataset is made publicly available for research purposes at https://github.com/davidmchan/Anim400K.
PDF120December 15, 2024