ChatPaper.aiChatPaper

RIR-Mega-Speech:具备完整声学元数据与可复现评估的回语音声语料库

RIR-Mega-Speech: A Reverberant Speech Corpus with Comprehensive Acoustic Metadata and Reproducible Evaluation

January 25, 2026
作者: Mandip Goswami
cs.AI

摘要

尽管针对混响语音的研究已持续数十年,但由于多数语料库缺乏单文件声学标注或可复现性文档不充分,方法对比仍存在困难。我们提出RIR-Mega-Speech语料库,该库通过将LibriSpeech语音与RIR-Mega集合中约5000条模拟房间脉冲响应进行卷积生成,总时长约117.5小时。每个文件均包含依据明确定义的可复现流程,从源脉冲响应计算得到的RT60、直达声与混响声能比(DRR)和清晰度指数(C_{50})。我们同时提供重建数据集和复现所有评估结果的脚本。 基于Whisper small模型对1500组配对语句的测试结果显示:纯净语音的WER为5.20%(95%置信区间:4.69-5.78),混响版本为7.70%(7.04-8.35),配对增量达2.50个百分点(2.06-2.98),相当于相对性能下降48%。WER随RT60增加呈单调上升趋势,随DRR增大而下降,这与既往感知研究结论一致。虽然混响损害识别性能的核心结论已获公认,但我们旨在为学界提供声学条件透明、结果可独立验证的标准化资源。该资源库包含适用于Windows和Linux环境的一键重建指令。
English
Despite decades of research on reverberant speech, comparing methods remains difficult because most corpora lack per-file acoustic annotations or provide limited documentation for reproduction. We present RIR-Mega-Speech, a corpus of approximately 117.5 hours created by convolving LibriSpeech utterances with roughly 5,000 simulated room impulse responses from the RIR-Mega collection. Every file includes RT60, direct-to-reverberant ratio (DRR), and clarity index (C_{50}) computed from the source RIR using clearly defined, reproducible procedures. We also provide scripts to rebuild the dataset and reproduce all evaluation results. Using Whisper small on 1,500 paired utterances, we measure 5.20% WER (95% CI: 4.69--5.78) on clean speech and 7.70% (7.04--8.35) on reverberant versions, corresponding to a paired increase of 2.50 percentage points (2.06--2.98). This represents a 48% relative degradation. WER increases monotonically with RT60 and decreases with DRR, consistent with prior perceptual studies. While the core finding that reverberation harms recognition is well established, we aim to provide the community with a standardized resource where acoustic conditions are transparent and results can be verified independently. The repository includes one-command rebuild instructions for both Windows and Linux environments.
PDF31January 30, 2026