ChatPaper.aiChatPaper

**SQuTR:噪声环境下的口语查询文本检索鲁棒性基准**

SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

February 13, 2026
作者: Yuejie Li, Ke Yang, Yueying Hua, Berlin Chen, Jianhao Nie, Yueping He, Caixin Kang
cs.AI

摘要

口语查询检索是现代信息检索中的重要交互方式。然而现有评估数据集通常局限于简单查询和受限噪声条件,难以全面评估复杂声学扰动下口语查询检索系统的鲁棒性。为解决这一局限,我们提出SQuTR——一个包含大规模数据集和统一评估协议的口语查询检索鲁棒性基准。SQuTR汇集了来自六个常用中英文文本检索数据集的37,317条独特查询,涵盖多领域和多样化查询类型。我们使用200位真实说话人的语音特征合成语音,并在可控信噪比下混合17类真实环境噪声,实现了从安静到高噪声场景的可复现鲁棒性评估。基于统一协议,我们对代表性级联式和端到端检索系统进行了大规模评估。实验结果表明,检索性能随噪声增强而下降,不同系统的性能衰减幅度存在显著差异。即使大规模检索模型在极端噪声下也表现不佳,表明鲁棒性仍是关键瓶颈。总体而言,SQuTR为基准测试和诊断分析提供了可复现的实验平台,并将推动面向文本检索的口语查询鲁棒性研究发展。
English
Spoken query retrieval is an important interaction mode in modern information retrieval. However, existing evaluation datasets are often limited to simple queries under constrained noise conditions, making them inadequate for assessing the robustness of spoken query retrieval systems under complex acoustic perturbations. To address this limitation, we present SQuTR, a robustness benchmark for spoken query retrieval that includes a large-scale dataset and a unified evaluation protocol. SQuTR aggregates 37,317 unique queries from six commonly used English and Chinese text retrieval datasets, spanning multiple domains and diverse query types. We synthesize speech using voice profiles from 200 real speakers and mix 17 categories of real-world environmental noise under controlled SNR levels, enabling reproducible robustness evaluation from quiet to highly noisy conditions. Under the unified protocol, we conduct large-scale evaluations on representative cascaded and end-to-end retrieval systems. Experimental results show that retrieval performance decreases as noise increases, with substantially different drops across systems. Even large-scale retrieval models struggle under extreme noise, indicating that robustness remains a critical bottleneck. Overall, SQuTR provides a reproducible testbed for benchmarking and diagnostic analysis, and facilitates future research on robustness in spoken query to text retrieval.
PDF1342February 17, 2026