ChatPaper.aiChatPaper

RIR-Mega-Speech:具备完整声学元数据与可复现评估的混响语音语料库

RIR-Mega-Speech: A Reverberant Speech Corpus with Comprehensive Acoustic Metadata and Reproducible Evaluation

January 25, 2026
作者: Mandip Goswami
cs.AI

摘要

尽管混响语音研究已开展数十年,但由于多数语料库缺乏单文件声学标注或可复现性文档不足,方法对比仍存在困难。我们推出RIR-Mega-Speech语料库,该库通过将LibriSpeech语音与RIR-Mega集合中约5000条模拟房间冲激响应进行卷积生成,总时长约117.5小时。每个文件均包含根据明确定义的可复现流程从源RIR计算得到的RT60、直达声与混响声能比(DRR)和清晰度指数(C_{50})。我们还提供了重建数据集和复现所有评估结果的脚本。 基于Whisper small模型对1500组配对语音的测试显示:纯净语音的WER为5.20%(95%置信区间:4.69-5.78),混响版本为7.70%(7.04-8.35),配对差异为2.50个百分点(2.06-2.98),相当于相对性能下降48%。WER随RT60增加呈单调上升趋势,随DRR增加而下降,这与既往感知研究一致。虽然混响损害识别性能的核心结论已获公认,但我们旨在为学界提供声学条件透明、结果可独立验证的标准化资源。该资源库包含适用于Windows和Linux环境的一键重建指令。
English
Despite decades of research on reverberant speech, comparing methods remains difficult because most corpora lack per-file acoustic annotations or provide limited documentation for reproduction. We present RIR-Mega-Speech, a corpus of approximately 117.5 hours created by convolving LibriSpeech utterances with roughly 5,000 simulated room impulse responses from the RIR-Mega collection. Every file includes RT60, direct-to-reverberant ratio (DRR), and clarity index (C_{50}) computed from the source RIR using clearly defined, reproducible procedures. We also provide scripts to rebuild the dataset and reproduce all evaluation results. Using Whisper small on 1,500 paired utterances, we measure 5.20% WER (95% CI: 4.69--5.78) on clean speech and 7.70% (7.04--8.35) on reverberant versions, corresponding to a paired increase of 2.50 percentage points (2.06--2.98). This represents a 48% relative degradation. WER increases monotonically with RT60 and decreases with DRR, consistent with prior perceptual studies. While the core finding that reverberation harms recognition is well established, we aim to provide the community with a standardized resource where acoustic conditions are transparent and results can be verified independently. The repository includes one-command rebuild instructions for both Windows and Linux environments.
PDF31January 30, 2026