Vox-Profile:一個用於表徵多樣化說話者與語音特徵的語音基礎模型基準
Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits
May 20, 2025
作者: Tiantian Feng, Jihwan Lee, Anfeng Xu, Yoonjeong Lee, Thanathai Lertpetchpun, Xuan Shi, Helin Wang, Thomas Thebaud, Laureano Moro-Velazquez, Dani Byrd, Najim Dehak, Shrikanth Narayanan
cs.AI
摘要
我們介紹了Vox-Profile,這是一個利用語音基礎模型來全面刻畫豐富說話者與語音特徵的基準測試。與現有僅關注單一維度說話者特徵的研究不同,Vox-Profile提供了反映靜態說話者特徵(如年齡、性別、口音)和動態語音屬性(如情感、語流)的整體多維度畫像。該基準測試基於語音科學與語言學,並與領域專家合作開發,以精確索引說話者與語音特徵。我們報告了使用超過15個公開語音數據集及多個針對不同靜態與動態說話者與語音屬性的廣泛使用的語音基礎模型進行的基準實驗。除了基準實驗外,我們還展示了Vox-Profile支持的幾項下游應用。首先,我們證明Vox-Profile能夠增強現有語音識別數據集,以分析ASR性能的變異性。Vox-Profile也被用作評估語音生成系統性能的工具。最後,我們通過與人類評估的比較來評估自動生成畫像的質量,並展示了收斂效度。Vox-Profile已公開於:https://github.com/tiantiaf0627/vox-profile-release。
English
We introduce Vox-Profile, a comprehensive benchmark to characterize rich
speaker and speech traits using speech foundation models. Unlike existing works
that focus on a single dimension of speaker traits, Vox-Profile provides
holistic and multi-dimensional profiles that reflect both static speaker traits
(e.g., age, sex, accent) and dynamic speech properties (e.g., emotion, speech
flow). This benchmark is grounded in speech science and linguistics, developed
with domain experts to accurately index speaker and speech characteristics. We
report benchmark experiments using over 15 publicly available speech datasets
and several widely used speech foundation models that target various static and
dynamic speaker and speech properties. In addition to benchmark experiments, we
showcase several downstream applications supported by Vox-Profile. First, we
show that Vox-Profile can augment existing speech recognition datasets to
analyze ASR performance variability. Vox-Profile is also used as a tool to
evaluate the performance of speech generation systems. Finally, we assess the
quality of our automated profiles through comparison with human evaluation and
show convergent validity. Vox-Profile is publicly available at:
https://github.com/tiantiaf0627/vox-profile-release.Summary
AI-Generated Summary