ChatPaper.aiChatPaper

面向俄语语音生成模型的音素与韵律挑战的数据中心化框架

A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models

July 17, 2025
作者: Kirill Borodin, Nikita Vasiliev, Vasiliy Kudryavtsev, Maxim Maslov, Mikhail Gorodnichev, Oleg Rogov, Grach Mkrtchian
cs.AI

摘要

俄语语音合成面临独特挑战,包括元音弱化、辅音清化、多变的重音模式、同形异义词歧义以及不自然的语调。本文介绍了Balalaika,一个包含超过2000小时录音室品质俄语语音的全新数据集,并配有详尽的文本标注,涵盖标点符号和重音标记。实验结果表明,基于Balalaika训练的模型在语音合成与增强任务上均显著优于使用现有数据集训练的模型。我们详细阐述了数据集构建流程、标注方法及对比评估结果。
English
Russian speech synthesis presents distinctive challenges, including vowel reduction, consonant devoicing, variable stress patterns, homograph ambiguity, and unnatural intonation. This paper introduces Balalaika, a novel dataset comprising more than 2,000 hours of studio-quality Russian speech with comprehensive textual annotations, including punctuation and stress markings. Experimental results show that models trained on Balalaika significantly outperform those trained on existing datasets in both speech synthesis and enhancement tasks. We detail the dataset construction pipeline, annotation methodology, and results of comparative evaluations.
PDF503July 21, 2025