面向俄语语音生成模型的音素与韵律挑战的数据中心化框架

摘要

俄语语音合成面临独特挑战，包括元音弱化、辅音清化、多变的重音模式、同形异义词歧义以及不自然的语调。本文介绍了Balalaika，一个包含超过2000小时录音室品质俄语语音的全新数据集，并配有详尽的文本标注，涵盖标点符号和重音标记。实验结果表明，基于Balalaika训练的模型在语音合成与增强任务上均显著优于使用现有数据集训练的模型。我们详细阐述了数据集构建流程、标注方法及对比评估结果。

English

Russian speech synthesis presents distinctive challenges, including vowel reduction, consonant devoicing, variable stress patterns, homograph ambiguity, and unnatural intonation. This paper introduces Balalaika, a novel dataset comprising more than 2,000 hours of studio-quality Russian speech with comprehensive textual annotations, including punctuation and stress markings. Experimental results show that models trained on Balalaika significantly outperform those trained on existing datasets in both speech synthesis and enhancement tasks. We detail the dataset construction pipeline, annotation methodology, and results of comparative evaluations.