**Whisper-RIR-Mega：面向ASR系统抗房间声学干扰能力的配对纯净-混响语音基准数据集**

摘要

我们推出Whisper-RIR-Mega基准数据集，该数据集包含成对的纯净与混响语音，用于评估自动语音识别（ASR）系统对房间声学的鲁棒性。每个样本将纯净的LibriSpeech语音片段与经过RIR-Mega语料库中真实房间冲激响应卷积处理的同语句进行配对，并按照混响时间（RT60）和直达混响比（DRR）进行分层划分。我们在1600个测试样本上评估了五种Whisper模型（从tiny到large-v3），并报告了纯净与混响条件下的词错误率（WER）和字错误率（CER）。实验表明混响会持续降低所有规模模型的性能，其导致的WER惩罚值根据模型不同在0.12至1.07个百分点之间浮动。我们公开该数据集、评估代码及基线结果，以支持鲁棒ASR的可复现研究。

English

We introduce Whisper-RIR-Mega, a benchmark dataset of paired clean and reverberant speech for evaluating automatic speech recognition (ASR) robustness to room acoustics. Each sample pairs a clean LibriSpeech utterance with the same utterance convolved with a real room impulse response from the RIR-Mega corpus, with stratified splits by reverberation time (RT60) and direct-to-reverberant ratio (DRR). We evaluate five Whisper models (tiny through large-v3) on 1600 test samples and report word error rate (WER) and character error rate (CER) under clean and reverberant conditions. Reverberation consistently degrades performance across all model sizes; the reverb penalty in WER ranges from 0.12 to 1.07 percentage points depending on the model. We release the dataset, evaluation code, and baseline results to support reproducible research on robust ASR.

Whisper-RIR-Mega：面向ASR系统抗房间声学干扰能力的配对纯净-混响语音基准数据集

Whisper-RIR-Mega: A Paired Clean-Reverberant Speech Benchmark for ASR Robustness to Room Acoustics

摘要

Support