ChatPaper.aiChatPaper

RIR-Mega:面向机器学习与房间声学建模的大规模仿真房间脉冲响应数据集

RIR-Mega: a large-scale simulated room impulse response dataset for machine learning and room acoustics modeling

October 21, 2025
作者: Mandip Goswami
cs.AI

摘要

房间脉冲响应(RIR)是去混响、鲁棒语音识别、声源定位及房间声学估计的核心资源。我们推出了RIR-Mega,这是一个大规模的模拟RIR集合,采用紧凑、机器友好的元数据模式描述,并配备了简便的工具以支持验证与复用。该数据集随附Hugging Face Datasets加载器、元数据校验与校验和脚本,以及一个参考回归基线,该基线能够从波形中预测RT60等目标。在36,000个训练样本和4,000个验证样本的分割上,基于轻量级时频特征的小型随机森林模型实现了接近0.013秒的平均绝对误差和约0.022秒的均方根误差。我们在Hugging Face上托管了包含1,000个线性阵列RIR和3,000个圆形阵列RIR的子集,便于流式传输与快速测试,并将完整的50,000个RIR存档保存于Zenodo。数据集与代码均公开,以支持可重复性研究。
English
Room impulse responses are a core resource for dereverberation, robust speech recognition, source localization, and room acoustics estimation. We present RIR-Mega, a large collection of simulated RIRs described by a compact, machine friendly metadata schema and distributed with simple tools for validation and reuse. The dataset ships with a Hugging Face Datasets loader, scripts for metadata checks and checksums, and a reference regression baseline that predicts RT60 like targets from waveforms. On a train and validation split of 36,000 and 4,000 examples, a small Random Forest on lightweight time and spectral features reaches a mean absolute error near 0.013 s and a root mean square error near 0.022 s. We host a subset with 1,000 linear array RIRs and 3,000 circular array RIRs on Hugging Face for streaming and quick tests, and preserve the complete 50,000 RIR archive on Zenodo. The dataset and code are public to support reproducible studies.
PDF21October 23, 2025