面向俄語語音生成模型的數據中心化框架:應對語音與韻律挑戰
A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models
July 17, 2025
作者: Kirill Borodin, Nikita Vasiliev, Vasiliy Kudryavtsev, Maxim Maslov, Mikhail Gorodnichev, Oleg Rogov, Grach Mkrtchian
cs.AI
摘要
俄語語音合成面臨著獨特的挑戰,包括元音弱化、輔音清化、可變的重音模式、同形異義詞的歧義性以及不自然的語調。本文介紹了Balalaika,這是一個包含超過2000小時錄音室品質俄語語音的新穎數據集,並配有全面的文本註釋,包括標點符號和重音標記。實驗結果表明,在語音合成和增強任務中,基於Balalaika訓練的模型顯著優於使用現有數據集訓練的模型。我們詳細闡述了數據集的構建流程、註釋方法以及比較評估的結果。
English
Russian speech synthesis presents distinctive challenges, including vowel
reduction, consonant devoicing, variable stress patterns, homograph ambiguity,
and unnatural intonation. This paper introduces Balalaika, a novel dataset
comprising more than 2,000 hours of studio-quality Russian speech with
comprehensive textual annotations, including punctuation and stress markings.
Experimental results show that models trained on Balalaika significantly
outperform those trained on existing datasets in both speech synthesis and
enhancement tasks. We detail the dataset construction pipeline, annotation
methodology, and results of comparative evaluations.