Vox-Profile:一个用于刻画多样化说话者与语音特征的语音基础模型基准
Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits
May 20, 2025
作者: Tiantian Feng, Jihwan Lee, Anfeng Xu, Yoonjeong Lee, Thanathai Lertpetchpun, Xuan Shi, Helin Wang, Thomas Thebaud, Laureano Moro-Velazquez, Dani Byrd, Najim Dehak, Shrikanth Narayanan
cs.AI
摘要
我们推出Vox-Profile,这是一个利用语音基础模型全面刻画说话者及语音特征的基准测试平台。与以往仅关注单一维度说话者特征的研究不同,Vox-Profile提供了反映静态说话者特质(如年龄、性别、口音)与动态语音属性(如情感、语流)的多维度综合画像。该基准基于语音科学与语言学理论,在领域专家的指导下开发,旨在精确索引说话者及语音特征。我们通过超过15个公开可用的语音数据集及多个广泛应用的语音基础模型,针对各类静态与动态的说话者及语音属性进行了基准实验。除了基准测试,我们还展示了Vox-Profile支持的多种下游应用。首先,我们证明Vox-Profile能够增强现有语音识别数据集,以分析ASR性能的变异性。此外,Vox-Profile也被用作评估语音生成系统性能的工具。最后,通过与人评估结果的对比,我们验证了自动化画像的质量,并展示了其收敛效度。Vox-Profile已公开发布于:https://github.com/tiantiaf0627/vox-profile-release。
English
We introduce Vox-Profile, a comprehensive benchmark to characterize rich
speaker and speech traits using speech foundation models. Unlike existing works
that focus on a single dimension of speaker traits, Vox-Profile provides
holistic and multi-dimensional profiles that reflect both static speaker traits
(e.g., age, sex, accent) and dynamic speech properties (e.g., emotion, speech
flow). This benchmark is grounded in speech science and linguistics, developed
with domain experts to accurately index speaker and speech characteristics. We
report benchmark experiments using over 15 publicly available speech datasets
and several widely used speech foundation models that target various static and
dynamic speaker and speech properties. In addition to benchmark experiments, we
showcase several downstream applications supported by Vox-Profile. First, we
show that Vox-Profile can augment existing speech recognition datasets to
analyze ASR performance variability. Vox-Profile is also used as a tool to
evaluate the performance of speech generation systems. Finally, we assess the
quality of our automated profiles through comparison with human evaluation and
show convergent validity. Vox-Profile is publicly available at:
https://github.com/tiantiaf0627/vox-profile-release.Summary
AI-Generated Summary