面向俄語語音生成模型的數據中心化框架：應對語音與韻律挑戰

摘要

俄語語音合成面臨著獨特的挑戰，包括元音弱化、輔音清化、可變的重音模式、同形異義詞的歧義性以及不自然的語調。本文介紹了Balalaika，這是一個包含超過2000小時錄音室品質俄語語音的新穎數據集，並配有全面的文本註釋，包括標點符號和重音標記。實驗結果表明，在語音合成和增強任務中，基於Balalaika訓練的模型顯著優於使用現有數據集訓練的模型。我們詳細闡述了數據集的構建流程、註釋方法以及比較評估的結果。

English

Russian speech synthesis presents distinctive challenges, including vowel reduction, consonant devoicing, variable stress patterns, homograph ambiguity, and unnatural intonation. This paper introduces Balalaika, a novel dataset comprising more than 2,000 hours of studio-quality Russian speech with comprehensive textual annotations, including punctuation and stress markings. Experimental results show that models trained on Balalaika significantly outperform those trained on existing datasets in both speech synthesis and enhancement tasks. We detail the dataset construction pipeline, annotation methodology, and results of comparative evaluations.