RIR-Mega:一個大規模的模擬房間脈衝響應數據集,用於機器學習與房間聲學建模
RIR-Mega: a large-scale simulated room impulse response dataset for machine learning and room acoustics modeling
October 21, 2025
作者: Mandip Goswami
cs.AI
摘要
房間脈衝響應是去混響、魯棒語音識別、聲源定位以及房間聲學估計的核心資源。我們推出了RIR-Mega,這是一個包含大量模擬RIRs的數據集,這些RIRs通過一個緊湊且對機器友好的元數據模式進行描述,並配備了簡單的工具以支持驗證和重用。該數據集附帶了Hugging Face Datasets加載器、用於元數據檢查和校驗的腳本,以及一個參考回歸基線,該基線能夠從波形中預測RT60類目標。在36,000個訓練樣本和4,000個驗證樣本的劃分下,基於輕量級時間和頻譜特徵的小型隨機森林模型達到了接近0.013秒的平均絕對誤差和接近0.022秒的均方根誤差。我們在Hugging Face上托管了一個子集,包含1,000個線性陣列RIRs和3,000個圓形陣列RIRs,以便於流式傳輸和快速測試,並將完整的50,000個RIR檔案保存在Zenodo上。數據集和代碼均公開,以支持可重複的研究。
English
Room impulse responses are a core resource for dereverberation, robust speech
recognition, source localization, and room acoustics estimation. We present
RIR-Mega, a large collection of simulated RIRs described by a compact, machine
friendly metadata schema and distributed with simple tools for validation and
reuse. The dataset ships with a Hugging Face Datasets loader, scripts for
metadata checks and checksums, and a reference regression baseline that
predicts RT60 like targets from waveforms. On a train and validation split of
36,000 and 4,000 examples, a small Random Forest on lightweight time and
spectral features reaches a mean absolute error near 0.013 s and a root mean
square error near 0.022 s. We host a subset with 1,000 linear array RIRs and
3,000 circular array RIRs on Hugging Face for streaming and quick tests, and
preserve the complete 50,000 RIR archive on Zenodo. The dataset and code are
public to support reproducible studies.