Whisper-RIR-Mega：面向房间声学环境下语音识别鲁棒性的配对纯净-混响语音基准数据集

摘要

我们推出Whisper-RIR-Mega基准数据集，该数据集包含配对的纯净与混响语音，用于评估自动语音识别（ASR）系统对房间声学的鲁棒性。每个样本将纯净的LibriSpeech语音片段与经过RIR-Mega语料库中真实房间脉冲响应卷积处理的同一语音片段配对，并按照混响时间（RT60）和直达混响声能比（DRR）进行分层划分。我们在1600个测试样本上评估了五种规模的Whisper模型（从tiny到large-v3），并报告了纯净与混响条件下的词错误率（WER）和字错误率（CER）。实验表明混响会持续降低所有规模模型的性能，其导致的WER损失范围在0.12至1.07个百分点之间，具体数值因模型而异。我们公开该数据集、评估代码和基线结果，以支持鲁棒ASR研究的可复现性。

English

We introduce Whisper-RIR-Mega, a benchmark dataset of paired clean and reverberant speech for evaluating automatic speech recognition (ASR) robustness to room acoustics. Each sample pairs a clean LibriSpeech utterance with the same utterance convolved with a real room impulse response from the RIR-Mega corpus, with stratified splits by reverberation time (RT60) and direct-to-reverberant ratio (DRR). We evaluate five Whisper models (tiny through large-v3) on 1600 test samples and report word error rate (WER) and character error rate (CER) under clean and reverberant conditions. Reverberation consistently degrades performance across all model sizes; the reverb penalty in WER ranges from 0.12 to 1.07 percentage points depending on the model. We release the dataset, evaluation code, and baseline results to support reproducible research on robust ASR.

Whisper-RIR-Mega：面向房间声学环境下语音识别鲁棒性的配对纯净-混响语音基准数据集

Whisper-RIR-Mega: A Paired Clean-Reverberant Speech Benchmark for ASR Robustness to Room Acoustics

摘要

Support