ChatPaper.aiChatPaper

Voxlect:一個用於建模全球方言與區域語言的語音基礎模型基準

Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe

August 3, 2025
作者: Tiantian Feng, Kevin Huang, Anfeng Xu, Xuan Shi, Thanathai Lertpetchpun, Jihwan Lee, Yoonjeong Lee, Dani Byrd, Shrikanth Narayanan
cs.AI

摘要

我們推出Voxlect,這是一個利用語音基礎模型來建模全球方言與區域語言的新穎基準。具體而言,我們報告了對英語、阿拉伯語、普通話與粵語、藏語、印度語系語言、泰語、西班牙語、法語、德語、巴西葡萄牙語及意大利語中方言與區域語言變體的全面基準評估。本研究使用了來自30個公開語料庫、總計超過200萬條帶有方言信息的訓練語句。我們評估了多種廣泛使用的語音基礎模型在方言分類上的表現,並測試了這些方言模型在噪聲條件下的魯棒性,同時進行了錯誤分析,揭示了與地理連續性相一致的建模結果。除了方言分類的基準測試外,我們還展示了Voxlect所支持的幾項下游應用。特別地,我們展示了Voxlect可用於為現有語音識別數據集增添方言信息,從而實現對ASR性能跨方言變化的更細緻分析。Voxlect也被用作評估語音生成系統性能的工具。Voxlect已公開提供,採用RAIL系列許可證,可通過以下鏈接獲取:https://github.com/tiantiaf0627/voxlect。
English
We present Voxlect, a novel benchmark for modeling dialects and regional languages worldwide using speech foundation models. Specifically, we report comprehensive benchmark evaluations on dialects and regional language varieties in English, Arabic, Mandarin and Cantonese, Tibetan, Indic languages, Thai, Spanish, French, German, Brazilian Portuguese, and Italian. Our study used over 2 million training utterances from 30 publicly available speech corpora that are provided with dialectal information. We evaluate the performance of several widely used speech foundation models in classifying speech dialects. We assess the robustness of the dialectal models under noisy conditions and present an error analysis that highlights modeling results aligned with geographic continuity. In addition to benchmarking dialect classification, we demonstrate several downstream applications enabled by Voxlect. Specifically, we show that Voxlect can be applied to augment existing speech recognition datasets with dialect information, enabling a more detailed analysis of ASR performance across dialectal variations. Voxlect is also used as a tool to evaluate the performance of speech generation systems. Voxlect is publicly available with the license of the RAIL family at: https://github.com/tiantiaf0627/voxlect.
PDF82August 5, 2025