面向俄语语音生成模型的音素与韵律挑战的数据中心化框架
A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models
July 17, 2025
作者: Kirill Borodin, Nikita Vasiliev, Vasiliy Kudryavtsev, Maxim Maslov, Mikhail Gorodnichev, Oleg Rogov, Grach Mkrtchian
cs.AI
摘要
俄语语音合成面临独特挑战,包括元音弱化、辅音清化、多变的重音模式、同形异义词歧义以及不自然的语调。本文介绍了Balalaika,一个包含超过2000小时录音室品质俄语语音的全新数据集,并配有详尽的文本标注,涵盖标点符号和重音标记。实验结果表明,基于Balalaika训练的模型在语音合成与增强任务上均显著优于使用现有数据集训练的模型。我们详细阐述了数据集构建流程、标注方法及对比评估结果。
English
Russian speech synthesis presents distinctive challenges, including vowel
reduction, consonant devoicing, variable stress patterns, homograph ambiguity,
and unnatural intonation. This paper introduces Balalaika, a novel dataset
comprising more than 2,000 hours of studio-quality Russian speech with
comprehensive textual annotations, including punctuation and stress markings.
Experimental results show that models trained on Balalaika significantly
outperform those trained on existing datasets in both speech synthesis and
enhancement tasks. We detail the dataset construction pipeline, annotation
methodology, and results of comparative evaluations.